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1.0  Introduction 


A  Phase  II  SBIR  contract  was  awarded  to  Space  Tech  Corporation  to 
develop  a  new  computer  architecture  for  HSMR  STEWS-ID-TA.  Foo  Lam  was 
technical  monitor  and  was  assisted  by  John  Wllllaois.  Michael  Andrews  was  the 
principal  Investigator  at  Space  Tech.  Several  Space  Tech  employees  were 
Involved  with  this  effort.  Steve  Hall  was  responsible  for  the  early  design 
concepts  of  the  CPH.  Larry  Hall  was  responsible  for  the  VPH  design  effort. 
Jeff  Weldeman  worked  on  the  cache,  address  generator,  lOP,  and  VMS  buffer 
boards.  James  Ott  worked  on  the  cache  board.  Phil  White  tested  the  crossbars 
and  finalized  the  backplane  design.  John  Stevens  generated  the  10  drivers  and 
Steve  Sharp  contributed  to  the  VPH  coding. 

Major  DOD  agencies  found  that  to  upgrade  their  hardware  development 
systems  to  keep  up  with  advancing  technology  remains  a  large  effort.  Yet,  a 
major  hidden  cost  Is  more  than  a  simple  acquisition  of  equipment.  Engineer 
retraining  and  software  redevelopment  easily  magnify  the  total  system  costs . 
In  early  1980,  Foo  Lam  at  the  Instrumentation  Directorate  at  White  Sands 
Missile  Range  discovered  a  uniquely  Innovative  solution:  build  a  hardware 
emulator  that  can  be  universally  applied  across  several  life  times  of 
architectural  technologies  and  modify  only  the  microcode.  Hence,  a  fixed  and 
constant  cost  will  remain  In  contrast  to  an  escalating  level  of  effort  each 
time  the  next  hottest  microprocessor  comes  out. 

White  Sands  Missile  Range  like  most  other  test  ranges  must  constantly 
upgrade  computing  facilities  to  take  advantage  of  cost  effective  solutions.  A 
proliferation  of  different  microprocessors  and  development  systems  spread 
among  the  several  laboratories  reduces  the  commonality  of  effort.  Code 
written  In  one  application  Is  likely  to  be  unsuitable  to  another.  Testing 
such  code  is  also  challenging  when  dlaslmllar  hardware  is  encountered.  A  type 
of  universal  or  meta-machlne  would  help  minimize  portability  constraints . 

In  response  to  this  need,  Lam's  meta-archltecture  was  discovered  that 
could  emulate  many  diverse  types  of  microprocessors  from  RISC  to  CISC.  Aptly 
called  the  Cascadable  Processor  Hardware,  the  CPH  machine  can  be  easily 
mlcrocoded .  More  Importantly ,  the  architecture  can  be  made  to  emulate  any 
wordlength  from  8-  to  128-blts .  Fixed-point  and  floating-point  arithmetic  for 
IEEE  and  DEC  formats  are  executed.  Special  fast  DSP  routines  are  mlcrocoded 
BO  that  mere  calling  routines  need  be  executed.  And  because  of  the  microcode 
capability,  a  user  can  program  In  the  language  of  his  desired  microprocessor. 
Two  significant  cost  savings  accrue.  First,  the  ARMY  proponent  need  no  longer 
purchase  costly  development  systems  each  time  another  micro  wants  to  be 
Incorporated.  Seco.id,  he  need  not  have  to  sacrifice  real-time  emulation 
because  the  CPH  Is  really  a  sixth  generation  architecture,  mostly  capable  of 
emulating  architectures  Int  the  early  2000s. 

Initial  architectural  studies  were  completed  by  Dr.  Javln  Taylor  at  Hew 
Mexico  State  University.  Latfr,  Space  Tech  Corporation  was  awarded  a  Phase  I 
and  Phase  II  effort  to  respond  to  this  requirement.  As  a  result  a  novel 
architecture  was  designed  that  Is  fast,  flexible,  and  cascadable.  The  long¬ 
term  goals  of  Mr.  Lam's  visionary  architecture  achieves  the  following 
objectives.  Cascadablllty  Is  easily  supported  by  merely  plugging  Into  the 
backplane  another  processor  and  no  new  microcode  Is  necessary. 
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The  heart  of  the  architecture  is  a  fully  concurrent  crossbar  chip.  The 
novel  chip  is  a  12xlA  port  switcher  which  can  be  dynamically  configured  In 
only  one  clock  cycle  (currently  20  nsecs).  The  chip  Is  also  directly 
cascadable  so  that  extensible  wordlengths  can  be  supported  In  hardware  with  no 
software  cycle  penalties.  The  crossbar  chip  Is  employed  in  the  processor 
section  and  the  address  generator  section  thus  attesting  to  Its  universality. 
No  doubt,  the  crossbar  will  find  equal  applications  In  modem  switchers,  beam 
splitters,  antenna  beam  formers,  telemetry,  telephony,  and  massively  parallel 
processing  architectures. 

During  this  Phase  II  effort,  a  mlcroprograimnlng  development  tool  was 
designed  called  MICROASM.  This  tool  development  was  Jointly  funded  by  support 
from  a  Phase  II  SBIR  contract  with  WSMR  and  SDC-Huntsvllle.  Mr.  K.  Pathak 
sponsored  this  work  at  SDC. 

This  report  Is  organized  as  follows.  The  early  sections  describe  the 
developmental  history  of  the  Phase  1  and  II  projects.  A  reading  Is  helpful  to 
understand  the  eventual  device  selections  for  the  functional  units.  A  very 
brief  description  of  the  units  and  the  overall  EVA  architecture  can  be  found 
In  Section  2  as  well.  Section  3  begins  the  detailed  explanation  of  the 
resources  Including  the  operation  of  those  modules  that  have  been  fabricated 
such  as  the  VPH.  Section  4  Introduces  some  of  the  concepts  In  programming 
the  CPH.  Section  5  describes  the  microprogramming  tool,  MlcroAsm,  which  will 
be  Important  when  the  CPH  Is  to  be  coded.  Sections  6  and  7  discuss  the 
results  and  suggestions  for  future  work. 

1.1  DmvmlopBantal  Hletozy  of  BVA  (Ixtondmblo  Voetor  Arehltoeturo) 

EVA  Is  an  extended  vector  architecture  computer.  It  consists  of  two 
major  functional  subsystems,  the  CPH  and  the  VPH.  The  CPH  architecture 
evolved  in  the  course  of  a  ten  year  period  with  the  current  effort  of  a  Phase 
I  and  Phase  II  SBIR.  EVA  Is  designed  to  support  a  cascadable  system  whereby 
users  can  insert  multiple  CPH  boards  into  the  system  and  extend  the 
wordlength.  The  architecture  has  been  in  development  over  several  device 
technology  evolutions.  It  has  seen  change  from  the  first  8-blt  slice  AMD  2900 
chips  through  the  current  64-bit  slice  BIT  2120  multipliers.  That  it  has 
withheld  change  over  these  years  attests  to  its  conceptual  strength.  These 
developmental  efforts  are  described  next  euid  will  be  important  to  the  reader 
when  the  current  architectural  issues  are  discussed. 

1.1.1  Phase  I  Xsssarch  Effort 

Details  of  the  Phase  I  effort  are  found  In  the  Phase  I  Final  Technical 
Report.  The  technical  objectives  are  cited  next  to  outline  the  steps  that 
were  taken  during  Phase  I. 

1.  Study  and  organize  the  EVA  architecture  Into  efficiently  coupled 
modules  for  radar  and  signal  processing.  In  this  step,  data  transfer 
techniques  were  Investigated  to  increase  I/O  transfers  at  the  chip  and  board 
levels.  Optimal  trade-offs  were  determined  among  engineering  parameters  of 
povmr,  board  size,  and  speed  of  operation  so  as  to  render  EVA  machinery  fast 
and  efficient  for  laboratory  and  range  instrumentation  applications. 
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2.  Detexrmine  the  optimal  trade-offs  between  fixed-point  and  floating¬ 
point  number  systems.  Also,  analyze  the  rounding  and  trvincation  issues  and/or 
the  overflow  and  underflow  issues  with  respect  to  fixed-point  and  floating¬ 
point  operations  in  the  EVA.  The  objective  was  to  identify  efficient 
wordlengths  for  signal  processors  in  EVA-like  architectures. 

3.  Study  optimal  ALU  configurations  that  speed  up  signal  processing  in 
EVA  architectures.  The  objective  here  was  to  determine  the  ideal 
configuration  (16x16,  32x32,  or  larger  multipliers)  which  supports  the 
processing  bandwidths  required. 

4.  Research  the  usage  of  fast  controller  circuits  that  may  utilize 
centralized  or  distributed  PLAs.  The  objective  of  this  step  was  to  improve 
arithmetic  processing  speeds  while  reducing  or  at  least  maintaining  low 
control  wire  count  from  the  control  unit  to  the  control  points  in  the 
architecture . 

5.  Research  microprograms  for  fixed-point  and  floating-point  signal 
processing  algorithms  executcdsle  on  EVA  architectures.  The  objective  was  to 
developed  sets  of  signal  processing  micro-routines  that  could  be  ported  across 
architectural  changes. 

The  following  sections  describe  the  efforts  undezrtaken  at  Space  Tech 
Corporation  (STC)  to  satisfy  the  objectives  set  forth  above.  The  basic 
architecture  for  the  EVA  organization  as  determined  from  Phase  I  is  shown  in 
Figure  1.  The  basic  architecture  derived  for  the  VPH  in  Phase  I  follows  in 
Figure  2.  During  Phase  II  the  VPH  architecture  was  modified  to  include  a 
better  VME  interface  controller  chip,  the  KVME  6000,  and  PALS  were  used 
instead  of  the  Motorola  BAMs  for  speed  reasons.  The  remaining  VPH  retained 
much  of  its  Phase  I  characterization  during  Phase  II.  In  fact,  the  VPH  final 
design  exceeded  its  Phase  I  speed  estimates  for  the  Ik  FFT.  The  730  usee 
benchmark  was  reduced  to  604  usee  in  the  final  Phase  II  architecture. 

The  EVA  is  an  architecture  concept  whereby  high-speed  yet  versatile  and 
efficient  computations  are  a  must.  In  order  to  reach  an  acceptable  compromise 
between  these  conflicting  needs,  the  process  of  selecting  the  building  blocks 
for  each  component  of  the  EVA  architecture  considered  several  issues. 
Minimum/maximum  cascadable  increments  (8,  16,  or  32  bits  CPH  only),  execution 
speed,  versatility,  availability,  amount  of  "glue  logic”  needed,  overall  chip 
count,  and  muimum  utilization  of  available  resources  are  just  a 
representative  sample  of  the  issues  considered. 

Figure  1  depicts  the  block  diagram  of  the  32-bit  EVA  architecture 
containing  the  Vector  Processing  Hardware  (VPH)  and  the  Cascadable  Processing 
Hardware  (CPH)  modules.  It  has  been  determined  that  all  of  the  modules  will 
connect  to  the  VMEbus.  The  VKEbus  data  transfers  between  modules  can  handle 
up  to  32  bits  in  one  transfer,  however  the  CPH  allows  up  to  64-bit  on-board 
data  manipulations  when  two  C'’H  modules  are  incorporated.  One  CPH  module  will 
suppoirt  up  to  32-bit  wordlengths.  This  cascadability  allows  users  to  maximize 
the  use  of  available  resources. 
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The  VPH  is  ideally  suited  for  high-speed  signal-processing  applications 
where  efficient,  complex-data  number-crunching  is  of  the  utmost  importance. 
The  heart  of  the  VPH  (the  ZR34325  also  referred  to  as  the  VSP-325  and  shown  in 
Figure  3)  is  capable  of  executing  high-level,  vector  oriented  instructions 
which  embed  the  DSP  algorithms  directly  into  the  device,  allowing  efficient 
algorithm  execution.  Moreover,  a  VSP-325  based  architecture  facilitates 
algorithm  partitioning  in  the  sense  that  multiple  VSP-325s  can  be  paralleled 
in  order  to  share  in  the  data  processing  requirements.  Hence,  while  the  VSP- 
3258  perform  parallel  processing  with  interleaved  l/o  on  the  data  from  one  RAM 
section,  the  host  or  the  CPH  can  be  up-loading  or  down-loading  data  into  the 
other  memory  bank  of  the  VPH.  Once  the  current  activities  are  completed,  the 
roles  of  the  VPH  memory  banks  are  reversed.  This  function-swapping  is  the 
primary  reason  for  the  efficiency  and  high  throughputs  attainable  with  the 
VPH. 


In  order  to  fully  capitalize  on  the  processing  power  of  an  EVA 
architecture,  the  system  bus  configuration  must  be  equally  capable  of 
interfacing  with  the  host,  and  within  modules  of  the  architecture.  A  study 
was  made  to  identify  the  most  optimal  bus  arrangement  which  allows  maximum 
exploitation  of  the  capabilities  of  the  EVA  architecture.  The  study  did  not 
consider  16-bit  bus  configurations  such  as  the  STD  bus,  MULTIBUS  I,  UNIBUS, 
and  Qbus.  The  reason  is  that  these  systems  do  not  satisfy  current  DSP  amd/or 
military  real-time  demands,  nor  are  they  capable  of  supporting  the  dynamic 
range  required  in  such  applications. 

The  Phase  I  effort  concluded  with  an  EVA  architecture  to  support  both 
DSP  via  the  VPH  and  cascadability  via  the  CPH.  The  Phase  II  effort  began  a 
year  later.  The  gap  in  time  offered  STC  and  WSMR  the  opportunity  to 
incorporate  new  technology  advances.  Phase  II  began  with  a  review  of  those 
advances . 

1.1.2  Fbaee  II  Developneatel  Effort 

Through  engineering  analysis.  STC  proposed  in  Phase  II  to  review, 
update,  and  modify  the  preliminary  EVA  designs  developed  during  Phase  I  of 
this  effort.  The  objective  was  to  ensure  Integration  of  the  latest  technology 
and  design  techniques  In  order  to  guarantee  longevity  and  usability  of  EVA 
over  a  wide  range  of  applications .  Of  paramount  Importance  was  the 
determination  of  the  optimal  number  of  board  and  Interboard  cabling  and 
control  requirements  for  efficient  operation  of  the  cascadsble  architecture. 

The  EVA  remains  an  architectural  concept  whereby  high-speed, 
versatility,  and  efficient  computation  are  balanced.  The  scope  of  this  Phase 
II  project  was  to  develop  a  system  that  Incorporates  cascadability  and  high¬ 
speed  data-  and  signal-processing.  The  building  blocks,  designed  In  Phase  I, 
for  each  component  of  the  EVA  were  expanded  Into  efficient,  working  modules. 
A  signal  processing  software  library,  containing  algorithms  that  enhance  the 
usability  of  the  EVA  architecture,  was  studied  but  not  fully  developed. 
Targeted  applications  for  the  EVA  Included  range  Instrumentation,  radar  signal 
processing,  digital  focusing,  spectral  data  processing,  Kalman  filtering,  and 
real-time  target  motion  resolution. 
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In  order  to  fully  capitalize  on  the  processing  power  of  an  EVA 
architecture,  the  system  bus  configuration  must  be  equally  capable  of 
interfacing  with  the  host,  and  within  modules  of  the  architecture.  Phase  I 
preliminary  studies  and  Phase  II  review  showed  that  the  VHE  system  provides 
the  speed,  versatility,  and  generality  required  in  an  EVA-llke  architecture. 
STC  Incorporated  a  bus  configuration  within  the  EVA  to  allow  maxi mum 
exploitation  of  the  architectural  capabilities.  Moreover,  its  asynchronous, 
non-multlplexed  protocol  Insured  longevity  of  the  system.  This  is 
accomplished  by  providing  the  flexibility  to  Incorporate  faster  devices  into 
the  system  design,  without  having  to  redesign  or  upgrade  the  interface  block. 
This  allows  the  system  performance  to  be  upgraded  as  superior  technology  Is 
developed.  In  addition,  various  processors  and  peripherals  can  operate  at 
various  speeds  without  having  to  wait  for  proper  timing  to  get  on/off  the  bus. 

Initially,  the  Phase  II  proposal  Identified  the  following  cascadable 
processing  hardware  as  depicted  in  Figure  4.  The  VPH  and  EVA  architectures 
were  depicted  In  previous  figures.  During  the  course  of  Phase  II,  the 
cascadable  processing  hardware  (CPH)  underwent  major  changes  described  In 
Section  1.2.  Those  changes  came  as  a  result  of  significant  conq>onent 
developments  described  next. 

1.1. 2.1  Significant  KVA  Coopomant  Conaldaratlona 

From  extensive  discussions  with  the  HSMR-ID-TA  staff,  it  was  determined 
that  the  BIT2110  and  BIT2120  devices  would  serve  as  the  main  processing 
engines  in  the  CPH.  Each  is  ideally  suited  as  a  32-  and  64-bit  device.  Also, 
such  devices  provide  pathways  to  future  ALUs  with  minor  changes  to  the 
microcode  and  boards.  The  VPH  numerical  engine  selected  was  the  Zoran  325  DSP 
device  which  became  available  during  Phase  II.  The  325  chips  performed  as 
needed.  In  many  cases  they  exceeded  the  speeds  of  other  choices  such  as  the 
Motorola  56000  and  96000.  The  AT&T  DSP  32C  and  TI32020  devices  were  too  slow 
for  the  WSMR  applications  and  were  discarded  early  in  the  design  selection 
process  of  Phase  II. 

During  Phase  II  GaAs  technologies  became  mature  such  as  the  Gazelle 
serial  transceivers.  These  GaAs  chips  provide  data  transfer  rates  In  the 
glgaflop  range  and  serve  as  the  high  speed  link  between  the  VPH  and  the  CPH. 
This  prompted  further  Investigations  Into  ultra  high  speed  buses.  The  high 
speed  10  or  HSIO  bus  was  designed  on  this  basis.  This  bus,  described  In  a 
later  section  under  the  CPH/VPH  link  section,  was  used  to  make  32-  and  64-blt 
data  transfers  among  the  modules  In  the  CPH.  Those  modules  Include  the 
processor,  cache  memory,  address  generator,  and  lOP. 

In  1991,  the  VPH  design  was  impacted  favorably  by  the  Introduction  of 
economical  4-port  memories.  The  4-port  memory  circuit  shown  in  Figure  5  made 
the  VPH  board  requirements  smaller.  The  device  was  Incorporated  into  the 
design  for  the  program  space  of  the  VPH  so  that  the  DSPs  could  share  the  data 
space  with  the  68020  and  the  ISA  Interface.  This  made  a  truly  versatile 
architecture  for  multiple  processing  tasks. 
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Lastly,  the  EVA  architecture  became  significantly  fast  when  a  custom 
crossbar  was  designed  by  Steve  Hall.  This  crossbar  depicted  In  Figure  6  was 
to  make  a  significant  Impact  on  the  large  scale  Integration  of  the  processor 
and  address  generator  boards.  The  original  organization  was  a  12x12 
configuration  as  shown.  Later  modifications  required  an  12x14  organization. 
However,  Internally,  the  functional  areas  remain  as  In  this  figure. 

1.1.2.2  Dmwlop—at  of  I/O  Conflgaratlon 

Before  an  Indepth  design  of  the  CPH  could  have  begun,  the  host  Interface 
design  needed  to  be  Investigated.  Hence,  a  major  design  Issue  was  to 
determine  how  the  CPH  Is  to  be  viewed  from  the  standpoint  of  the  host  or 
system  controller.  Three  basic  schemes  described  next  were  Investigated  early 
In  Phase  II.  The  CPH  Bus-Based  system  was  finally  chosen. 
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Primitive  Processing  Unit 


This  is  the  simplest  possible  view  of  the  CPH.  In  this  scheme,  the  CPH 
functions  as  a  processor  with  virtually  no  control  intelligence*  The  host 
provides  the  data  to  be  processed,  the  microprogram  code  to  be  executed,  and 
explicit  control  instructions  on  where  in  CPH  memory  to  place  the  data  and 
microcode  and  where  to  begin  execution*  Output  from  the  CPH  to  the  host  would 
be  handled  in  a  similar  fashion.  In  this  scheme,  the  host/CPH  Interface  would 
Involve  some  rudimentary  handshaking  logic  to  initiate  transfers,  and  logic  to 
allow  the  host  to  access  CPH  memory* 

Intelligent  lOP 

This  is  the  next  more  sophisticated  view  of  the  CPH*  In  this  scheme,  an 
I/O  processor  would  be  Incorporated  into  the  CPH  idilch  would  have  a  fair  level 
of  control  Intelligence*  The  lOP  would  handle  all  transactions  between  the 
host  and  CPH*  The  lOP  would  have  access  to  the  CPH  memory  space,  and  would 
handle  the  task  of  informing  the  CPH  where  data  is  located,  where  to  begin 
execution,  and  all  handshaking  between  host  and  CPH*  In  this  scheme,  the 
host/CPH  Interface  would  require  some  processing  ability  of  its  own  -  probably 
a  microprocessor  such  as  a  68000*  Some  additional  logic  to  support  the 
microprocessor  would  be  required* 

CPH  Bus-Based  System 

This  is  the  most  sophisticated  view  of  the  CPH*  In  this  scheme,  a  high¬ 
speed  bus  would  be  developed  for  the  CPE.  A  bus  controller  would  link  the  CPH 
bus  to  the  CPH  backplane*  An  intelligent  Interface  would  link  the  CPH  bus  to 
the  host*  All  transactions  between  CPH  and  host  would  be  handled  by  both  the 
host  Interface  and  the  CPH  bus  controller.  In  this  scheme,  resource 
requirements  would  far  exceed  those  of  either  method  previously  outlined. 

Impacts,  Conqjarlsons ,  and  Additional  Considerations 

If  the  primitive  approach  is  taken,  CPH  throughput  will  be  negatively 
affected,  since  a  great  deal  of  system  overhead  exists  for  the  host  to  service 
the  CPH.  The  taslu  of  processing  and  I/O  cannot  occur  concurrently.  If  the 
lOP  approach  is  taken,  a  marked  Increase  in  system  throughput  can  be  achieved. 
This  is  largely  due  to  the  fact  that  the  lOP  can  handle  I/O  tasks  while 
processing  of  other  data  is  being  done.  The  increase  in  throughput  may  Indeed 
be  significantly  Improved  under  this  scheme,  as  it  is  likely  that  I/O  time  for 
a  given  task  will  be  equivalent  to  the  processing  time  required.  Throughput 
may  be  Increased  by  as  much  as  a  factor  of  two. 

Implementation  of  a  bus-based  CPH  could  provide  a  similar  Increase  in 
throughput,  as  well  as  increase  overall  system  flexibility,  since  additional 
special-purpose  modules  could  be  designed  to  hang  on  the  CPH  system  bus.  In 
terms  of  intact  on  development  costs,  the  lOF  approach  would  add  very  little 
to  development  costs.  A  few  more  chips  would  be  required  than  if  the  CPH  is 
capable  of  only  very  rudimentary  I/O,  but  the  price  of  these  additional  chips 
is  nothing  when  coiiq>ared  to  the  cost  of  system  memory.  Design  time  would  be 
Increased  very  little,  as  some  type  of  I/O  circuitry  must  be  developed.  Khile 
the  iBq>lementatlon  of  an  lOP  is  more  sophisticated  than  the  primitive 
approach,  the  task  of  design  may  actually  be  somewhat  simplified  because  of 
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having  a  microprocessor  to  handle  control  and  routing  of  data. 

Development  of  a  system  bus  for  the  CPH  would  be  the  most  expensive  In 
terms  of  both  resources  required  and  design  time  required.  A  number  of 
additional  considerations  should  be  taken  Into  account  In  determining  which 
I/O  approach  to  take.  Among  these  Is  the  Idea  of  developing  a  macro  or 
assendily  language  for  the  CPH.  The  CPH  Is  a  poor  architecture  for 
Implementing  looping  or  branching  In  programs.  Also,  processing  of  scalar 
operations  Is  not  one  of  the  CPH’s  strong  points.  This  means  that  under  the 
primitive  approach  to  1/0,  separate  and  distinct  microprograms  must  be  written 
for  every  task  It  Is  to  accomplish.  Writing  microprograms  Is  a  complicated, 
time-consuming  task  that  requires  an  Intimate  knowledge  of  the  architecture. 
In  addition.  Implementing  scalar  operations  In  microcode  results  In 
Inefficient  use  of  processor  time. 

Designing  an  lOP  for  the  CPE  would  allow  development  of  a  library  of 
fundamental  microcode  routines  which  could  be  assembled  Into  many  useful,  much 
larger  routines.  These  assembled  routines  might  not  make  the  most  efficient 
use  of  the  processor,  but  In  terms  of  time  saved  In  not  having  to  write  long, 
complicated  microprograms,  this  could  be  a  very  attractive  feature  to 
potential  users.  In  addition,  the  microprocessor  used  In  the  lOP  could  be 
used  to  Improve  processing  of  scalar  operations  -  something  for  which  the 
microprocessor  Is  more  well-suited  than  the  CPH.  For  the  project  at  hand, 
development  of  the  macro  language  does  not  have  to  be  done,  but  If  this 
capability  Is  desired.  It  must  be  designed  In  now,  or  the  system  will  have  to 
be  redesigned  at  a  later  time  when  the  feature  becomes  desirable.  This  Is  a 
waste  of  both  time  and  money. 

Development  of  a  system  bus  is  lnq>ortant  In  a  multl-CPH  system,  or  In  a 
turnkey  or  stand-alone  CPH-based  system.  Currently,  development  of  an  lOP  for 
the  system  seems  a  desirable  and  cost-effective  approach  to  take. 
Microprogram  storage  RAM  costs  about  $.40  per  Instruction,  and  data  cache  RAM 
costs  about  $.08  per  word.  External  memory  for  storage  of  lOP  data  and 
programs  would  cost  less  than  $.0025  per  word.  When  viewed  In  this  light,  the 
lOP  approach  may  be  the  least  expensive  approach  to  take,  since  RAM  space  for 
storing  lOP  programs  Is  much  less  expensive  than  RAM  space  for  microcode 
routines  to  handle  1/0.  The  microprogram  memory  will  not  have  to  be  as  deep 
If  an  lOP  Is  used,  and  the  money  saved  on  microprogram  storage  space  will 
likely  pay  for  the  parts  required  to  construct  an  lOP. 

1.1. 2*3  Davmlopaant  o£  K?A  Control  Storo 

In  order  to  effectively  use  EVA  with  as  many  microprograms  as  possible, 
a  writable  control  store  organization  was  chosen.  This  organization  allows 
the  user  to  load  In  at  runtime  as  many  microprograms  as  Is  needed  for  a 
sequence  of  tasks.  This  type  of  control  store  then  makes  very  efficient  usage 
of  the  costly  high  speed  RAM  by  loading  and  subsequently  u^oadlng  precious 
space.  Reusing  the  control  store  space  requires  different  supporting  hardware 
than  an  EPROM  or  fixed  microcode  memory. 

A  typical  control  store  circuit  Is  shown  In  Figure  7.  With  this  design, 
one  seas  that  Interruption,  micro-level  subroutlnlng,  and  context  switching 
are  supported  as  Is  necessary  In  writable  control  stores.  An  adder  Is 
Included  In  order  to  compute  address  offsets  so  that  relative  addressing  can 


21 


be  supported  at  the  microcode  level.  In  writable  control  store  architectures, 
relative  addressing  Is  necessary,  otherwise  users  could  not  download 
microprograms  without  wasting  writable  control  store  space.  To  avoid  the 
loss,  every  microprogram  should  fit  In  the  next  available  location.  However, 
that  location  would  not  be  known  a  priori.  So  some  hardware  must  be  Included 
In  the  controller  to  offset  locations  from  the  last  microprogram  loaded  Into 
the  HCS. 

Stack  pointers  can  also  be  supported  by  the  stack  pointer  registers  In 
the  upper  left  portion  of  Figure  7.  This  facility  makes  microprogram  coding 
simpler  and  alleviates  complicated  address  calculations  by  the  user  In 
advance.  Stack  pointers  also  facilitate  subroutine  calls  and  nesting.  An 
address  space  exceeding  64k  Is  desired  because  of  the  several  simultaneously 
loaded  microprograms  idilch  should  be  resident  In  the  WCS.  Thus,  the  counters 
and  adder  should  handle  20-blts  Instead  of  16-blts  ( 16-bit8  spans  only  64k) . 

Examining  off-the-shelf  components  for  a  microsequencer  20-blt  adder 
faster  than  50  nsecs  found  no  such  devices.  Even  the  counter  must  be  built  up 
from  discrete  devices  In  order  to  achieve  50  nsec  speeds.  An  estimate  of  the 
chip  count  for  discrete  logic  components  for  the  complete  sequencer  Indicates 
that  at  least  50  24-pln  chips  may  be  needed.  The  Phase  II  Investigation 
proceeded  to  analyze  faster  and  denser  FPGA  chips,  among  those  Included  the 
chips  from  Plus  Logic.  It  was  found  possible  that  one  FPGA  will  replace  50 
random  logic  devices.  The  board  space  savings  became  very  attractive.  But  in 
addition,  the  ability  to  reprogram  an  FPGA  without  having  to  redesign  the 
entire  PCB  became  more  attractive. 

During  1990,  software  was  received  from  Plus  Logic  to  evaluate  the  FPGA 
devices  STC  anticipated  for  the  microprogram  sequencer  and  address  generators. 
That  code  helped  STC  to  lay  out  a  chip  from  the  standard  cells  available  from 
Plus  Logic.  Using  a  FPGA  is  important  because  design  changes  can  now  be  made 
to  the  device  Instead  of  the  already  manufactured  PCB  (which  may  be  cost 
prohibitive) .  STC  anticipated  using  the  Plus  Logic  devices  for  a  20-bit  adder 
and  counter.  The  major  Issue  In  the  speed  was  the  need  for  carries  and 
borrows  across  20-blts. 

Five  4-blt  adders  could  have  been  used  but  carry  lookahead  circuits  must 
be  built.  Xlllnx,  at  first,  appeared  to  be  an  adequate  solution  but  later 
Investigation  showed  that  Xlllnx  cells  were  only  suitable  for  random  logic  and 
not  adders  and  cotmters.  The  basic  Xlllnx  cell  called  a  Configurable  Logic 
Block  (CLB)  Is  depicted  In  Figure  8.  Each  cell  Is  coaq>rlsed  of  two  FFs  and  a 
combinatorial  logic  section  containing  a  program  memory  controlled 
multiplexer.  Subsequently,  the  FPGA  design  for  the  two  dimensional  counters 
was  completed  with  some  custom  library  components  provided  by  Plus  Logic. 
Every  1/0  pin  and  functional  block  of  the  FPGA2020  was  used. 
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It  was  desired  that  part  of  the  EVA  microprogram  sequencer  could  be  fit 
into  an  FPGA.  Plus  Logic  began  working  on  a  custom  component  for  another 
company  which  is  an  adder,  mux,  and  incrementer  all  in  one  part.  When  this 
component  was  to  be  completed.  Space  Tech  would  evaluate  it  and  determine  if 
it  could  be  used  as  part  of  the  microprogram  sequencer.  It  wasn’t  completed. 

Several  of  the  CPH’s  circuits  required  large  nusdsers  of  small  and  medium 
scale  Integrated  circuits .  Some  of  these  could  be  reduced  down  to  a  few  chips 
with  the  use  of  Field  Programmable  Gate  Arrays  (FPGAs)  from  Plus  Logic.  FPGAs 
from  other  sources  had  been  evaluated  and  found  unsuitable  for  use  in  the  CPE. 
High  speed  adders  and  counters  are  required.  Plus  Logic  FPGAs  can  be  used  to 
Implement  counters  of  any  number  of  bits  idiich  can  be  clocked  at  40  MHz. 
Adders  have  a  carry  propagation  time  of  1  nsec  per  bit.  This  was 
significantly  faster  than  any  other  FPGAs. 

Plus  Logic’s  FPGAs  are  constructed  with  an  EPROM  technology  which  allows 
them  to  be  easily  reprogrammed.  This  is  another  advantage  of  using  FPGAs  in 
the  CPE.  The  ability  to  modify  a  section  of  circuitry  on  an  FPGA  as  opposed 
to  modifying  a  printed  circuit  board  is  an  Important  feature.  A  mistake  or 
modification  to  a  printed  circuit  board  could  require  a  new  board.  This  would 
mean  an  ERE  charge  of  several  thousand  dollars.  With  extensive  use  of  FPGAs 
and  PALs  it  is  possible  to  change  a  circuit  without  actually  rewiring  the 
circuit  board.  The  larger  the  FPGAs,  the  better  the  chance  of  b^lng  able  to 
make  a  change. 

FPGAs  also  result  in  a  significant  parts  reduction.  For  exasiple,  the 
section  of  the  address  generator  board  containing  four  two  dimensional 
counters  and  an  incrementer  file  would  require  125  chii^o.  With  the  use  of 
Plus  Logic  FPGA2040  arrays  the  parts  cotmt  could  be  reduced  to  16.  However, 
these  chips  are  not  yet  available.  The  use  of  the  proposed  smaller  (and 
available)  FPGA2020  arrays  would  result  in  a  part  count  of  36.  The  savings  of 
board  manufacturing  costs  and  engineering  costs  rlone  offset  the  cost  of  the 
Plus  Logic  development  system.  The  basic  2020  device  is  depicted  in  Figure  9. 

1.1. 2. 4  Dawalopont  of  PC  Intarfaco  Board 

To  coordinate  design,  development,  and  testing,  a  special  PC  Interface 
board  was  designed  first.  An  initial  candidate  for  the  PC  Interface  board  was 
designed  based  on  the  following  assumptions.  First,  WSMR  will  use  a  Zenith 
286  to  Interface  to  the  CPE.  Second,  the  same  board  will  be  used  to  test  the 
CPE  boards  during  code  development  at  STC  where  a  286  PC  will  be  used.  Third, 
the  Interface  control  from  the  perspective  of  both  machines  (the  PC  as  well  as 
the  CPE)  is  basically,  "the  PC  (or  CPE)  sees  a  register  from  which  to  ’write 
to’  or  ’read  from’".  However,  the  PC  is  a  16-blt  bus  and  the  CPE  is  a  32-blt 
bus.  Hence,  the  interface  board  must  multiplex  data  accordingly  depending  on 
the  direction  of  the  data.  Fourth,  the  board  was  designed  to  easily  interface 
to  typical  bit-slice  architectures  such  as  the  CPE.  Fifth,  the  board  shall  be 
capable  of  driving  high-speed  data  across  long  distances.  Here,  the  IEEE  RS- 
422  receivers  are  used.  To  Invoke  the  simple  handshake  protocol  earlier, 
FIFOs  were  used  on  the  board.  FIFO  signals  such  as  almost  full  and  almost 
empty  are  to  be  monitored. 
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1. 1.2.5  Study  of  PCS  Maaufaetura  Toehnlquoa 

Central  to  the  eventual  Phase  II  objectives  was  a  study  of  PCB 
techniques.  A  search  and  analysis  of  quality  board  manufacturers  was  done 
with  the  Indepth  feedback  from  Unlclrcult  In  Englewood,  Colorado.  The  factors 
with  the  greatest  Impact  on  cost  and  complexity  of  manufacture  Include  the 
number  of  layers  and  the  use  of  Interstitial  or  blind  vlas.  (A  blind  via  Is  a 
hole  «dilch  Is  burled  Inside  the  layers  or  only  comes  out  one  side  of  the 
board.  Using  such  a  via  makes  bed-of -nails  testing  almost  Impossible  because 
the  fixture  cannot  touch  this  via  directly.)  The  physical  dimensions  of  the 
eventual  board  have  some  effect  when  the  board  exceeds  8"  x  10".  Trace  widths 
less  than  8  mils  and  via  sizes  smaller  than  15  mils  would  also  significantly 
Increase  cost.  When  the  boards  are  to  he  layed  out,  special  vlas  will  be 
reduced  and  replaced  with  another  layer  since  this  approach  Is  less  costly. 
Traces  and  spaces  of  10  mils  can  be  used  effectively.  Manufacturers  suggested 
that  this  line  width  offers  the  best  price  per  real  estate. 

1990  tooling  charges  were  approximately  $100  per  layer.  Fabrication 
costs  for  an  8-layer  board  with  low  complexity  were  approximately  $200  for  a 
board  of  approximately  8"  x  15".  Costs  for  creation  of  the  bed-of -nails  test 
fixture  for  checking  board  Integrity  are  about  $500  on  the  basis  of  a  pin 
count  of  3000. 

Subsequently,  PCB  fabrication,  assend^ly,  and  test  were  approximately 
$1700  per  board,  assuming  10-layer  boards  with  pin  counts  up  to  2000  per 
board.  EVA  architecture  originally  anticipated  4  boards,  a  CPH,  an  lOP,  a 
cache  memory,  and  the  VPH.  At  a  minimum,  $1200  was  to  be  expected  for  the  PCB 
effort  of  a  single  board.  It  did  not  Include  parts  or  functional  circuit 
testing  at  STC.  Final  costs  rose  to  $2200  per  board. 

1.2  Bmeulte  of  th«  BVA  Phuo  II  Projoet 

As  mentioned  earlier,  the  Phase  Z1  development  effort  underwent 
significant  changes  to  the  Cascadable  Processor  Hardware  (CPH) .  Figure  10 
depicts  the  current  CPH.  It  differs  from  the  previous  architecture  In  that 
two  ALUs  and  two  multipliers  are  embedded  on  each  board  instead  of  one  each 
per  board.  From  design  efforts  early  In  Phase  II,  it  was  determined  that 
doubling  the  processing  power  on  a  CPH  board  could  reduce  the  data  traffic 
bottlenecks  for  the  HSIO  and  facilitate  64-bit  processing  on  one  board  Instead 
of  two.  In  order  to  accomplish  this  integration,  a  new  chip  was  designed 
called  the  Crossbar.  This  chip  was  fabricated  by  ILSI  in  Colorado  Springs  for 
the  EVA  architecture  and  Is  described  In  a  later  section.  Such  a  chip  was 
necessary  to  reduce  the  several  multiplexers  Into  one  single  device  for  the 
CPH.  The  datapath  from  ALUs  to  general  purpose  registers  In  Figure  11  was  one 
example  of  significant  crossbar  usage.  Later,  It  was  discovered  that  the  same 
chip  could  be  used  In  the  address  generator  board. 
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The  EVA  organization  began  to  solidify  by  the  second  year  into  several 
boards.  A  site  visit  by  Mr.  Lam  and  John  Williams  from  WSMR-ID  reviewed  the 
new  EVA  architecture.  Later,  discussion  with  ID  found  a  direct  application 
with  another  HSMR  SBIR  contractor.  Mentor.  The  Mentor  application  included 
radar  tracker  processing.  The  majority  of  that  processing  task  centered 
around  the  Kalman  filter.  This  directed  the  STC  design  team's  attention  to 
fast  address  generation  for  the  complex  matrix  operations.  An  address 
generator  was  sought  that  would  produce  complex  addresses  in  hardware  at  real¬ 
time  speeds  so  that  no  computational  overhead  would  result.  And  a  study  of 
matrix  algorithms  was  initiated  to  ensure  that  EVA  throughput  was  high.  That 
algorithm  study  is  discussed  in  Section  4. 

Each  board  performed  a  separate  and  distinct  function  so  that  a 
cascadable  design  became  feasible.  As  an  introduction,  those  boards  are 
briefly  discussed  in  Section  2.0.  The  boards  as  organized  developed  into  a 
very  powerful  computing  engine  and  exceeded  the  performance  specifications  of 
the  Phase  II  proposal  by  ti«o  orders  of  magnitude  in  some  cases.  A  single  EVA 
machine  could  perform  over  30  operations  per  clock.  Hence,  if  a  20  MHz  clock 
were  used,  EVA  would  be  a  600  raflop  machine  in  a  single  desktop  machine.  The 
innovation  became  so  attractive  to  Space  Tech  that  the  current  EVA 
architecture  was  proposed. 

Later  results  during  the  second  year  proved  to  be  demanding  to  the 
design  team  at  Space  Tech.  Advanced  devices  that  were  designed  into  the 
architecture  had  to  be  removed  because  the  devices  did  not  become  available, 
were  removed  from  production,  or  were  functionally  changed.  The  Plus  Logic 
PPGA  2040  which  was  to  be  an  integral  part  of  the  address  generator  never 
became  available.  The  2020  was  substituted.  The  AMD  29540  FFT  address 
generator  chip  was  deleted  from  inventory.  Finally,  the  BIT  devices  that  vrere 
delivered  lacked  some  of  the  vital  control  and  status  signals  promised  in  the 
advanced  specifications.  As  these  were  sole  source  suppliers,  the  EVA 
architecture  design  had  to  undo  some  of  the  effort  and  restart  with  less 
powerful  chips  like  the  FPGA  2020. 

The  VPH  effort  proceeded  more  smoothly  since  all  parts  remained 
available  throughout  the  project.  One  major  new  chip  discovery  in  December  of 
1989  which  reduced  board  space  needs  was  a  four  port  RAM  from  IDT  with  a 
7052S35G  part  number.  This  single  device  reduced  space  by  20%  which  allo%«ed 
more  functionality  to  be  embedded  on  the  VPH.  Prior  to  that  only  the  Micro 
Technology  MT42C8128  was  available  and  was  seriously  being  considered.  It  was 
an  expensive  part. 

During  May  of  1990,  with  considerable  discussion  with  the  technical 
monitor,  the  value  of  making  the  architecture  more  general  purpose  became  more 
apparent.  To  that  and,  several  changes  were  made  to  the  schematic  of  the  VPH. 

The  input  bus  to  the  board  from  the  VME  was  originally  designed  to  be 
only  a  32-bit  interface.  Modifications  have  been  made  which  allow  the 
interface  to  be  configured  either  as  a  16-  or  32-bit  bus  through  the  use  of  a 
simple  jumper  scheme.  Due  to  the  type  of  processing  the  VPH  is  designed  to 
perform,  namely  DSP,  and  the  computational  speed  it  is  capable  of  maintaining, 
I/O  bandwidth  becomes  a  serious  concern.  In  fact,  the  VMS  bus  trould  be  sorely 
strained  to  keep  the  VPH  busy.  Because  of  this  fact,  it  was  originally 
proposed  to  make  the  68020  processor  bus  available  off  the  board.  This  was 
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proposed  to  allow  for  the  development  of  external  A/Db  and  O/As  which  would 
interface  to  the  68020.  From  subsequent  discussions  with  Mr.  Lam,  it  became 
apparent  that  it  trauld  be  beneficial  if  "off  the  shelf"  D/As  and  A/Ds  could  be 
interfaced  directly  to  the  VPH.  To  that  end  and  because  the  68020  bus  is  so 
similar  to  the  VME  bus,  it  was  decided  in  May  of  1990  to  provide  a  rudimentary 
VME  bus,  devoid  of  the  layers  of  protocol,  but  able  to  support  simple  I/O 
boards.  The  final  design  of  1992  provides  full  VME  bus,  however,  due  to  the 
desire  to  interface  to  single  board  computers  (SBC)  acting  as  masters. 

The  form  factor  of  the  VPH  board  was  selected  to  be  a  VME  90  and  has  the 
capability  of  holding  4k  %«ords  of  data  RAM.  Because  4k  words  is  not  enough 
memory  for  some  large  data  set  problems,  it  was  decided  to  allow  for  memory 
expansion.  Expansion  is  accomplished  by  the  addition  of  daughter  cards  which 
sandwich  to  the  base  board.  Each  daughter  card  contains  an  additional  4k 
words  and  all  of  the  required  bus  buffering  and  decoding.  Dp  to  three 
additional  boards  may  be  added  to  the  base  board,  bringing  the  data  ram  up  to 
16k  words.  Provisions  have  been  made  to  support  the  new  4kx8  chips  when  they 
become  available.  This  would  double  the  data  space. 

The  need  for  flexibility  gave  rise  to  a  possible  enhancement  to  the  VPH. 
Because  of  the  similarity  of  the  VME  and  the  IBM-AT  and  EISA  bus 
architectures,  investigations  as  to  the  possibility  of  mounting  the  VPH  in  an 
external  box  with  po%fer  supply  and  minimal  interfacing  logic  proceeded.  This 
would  allow  the  same  board  with  no  modifications,  only  additions,  to  be 
interfaced  to  a  commonly  available  and  inexpensive  computational  platform. 

By  June  of  1990,  a  general  VPH  concurrent  operating  scheme  for  a  status 
latch  through  which  the  five  processors  may  s^re  status  information  was 
agreed  upon  with  USMR.  The  need  for  such  a  status  latch  arose  from  the  multi¬ 
processor  nature  of  this  system.  Consider,  as  an  example,  the  task  of 
performing  a  two-dimensional  FFT,  with  processing  by  all  four  Zorans.  Roughly 
stated,  the  procedure  is  to  first  perform  FFTs  on  the  rows  of  the  matrix,  then 
perform  FFTs  on  the  resulting  columns.  The  four  Zorans  share  the  work  of 
performing  these  FFTs.  Because  of  the  way  the  problem  will  be  partitioned, 
the  Zorans  will  not  complete  the  initial  task  of  computing  row  FFTs  at  the 
saaie  Instant.  Some  delay  must  then  exist  for  some  of  the  processors  before 
the  column  FFTs  may  be  computed.  The  status  latch  concept  will  allow  the 
Zorans  to  keep  track  of  the  status  of  their  coiiq>anlon  processors  without  the 
intervention  of  the  68020,  keeping  it  free  to  perform  other  tasks.  Later  it 
was  agreed  that  assigning  each  processor  two  status  bits  should  allow  for 
ample  versatility. 

Examination  of  a  preliminary  design  for  the  status  latch  shared  among 
the  processors  revealed  that  the  design  was  deficient  in  several  respects. 
The  latch  would  allow  any  processor  to  write  status  bits  to  the  latch,  but  in 
the  case  of  the  Zorans,  whenever  one  Zoran  wrote  its  status  the  status  of  its 
bus  coBipanion  would  be  lost  from  the  latch.  To  prevent  loss  of  status  bits 
from  the  latch,  a  duplicate  image  of  the  status  bits  for  both  Zorans  on  a  bus 
would  have  to  be  sMlntalned  in  the  PRAM  for  that  bus.  A  Zoran  expecting  to 
write  its  status  would  first  read  the  status  image  in  the  PRAM,  would  write 
back  to  PRAM  an  updated  status  nibble  reflecting  the  new  status,  and  would 
finally  write  the  updated  nibble  to  the  status  latch.  This  sequence  requires 
a  read  and  a  write  to  PRAM  and  a  write  to  the  status  latch.  The  time  Involved 
is  not  a  major  concern,  since  writing  out  statxis  info  represents  only  a  very 
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small  fraction  of  the  tasks  performed.  However,  this  sequence  of  operations 
contains  a  hazard  idilch  could  result  In  problems.  Between  the  time  a  Zoran 
reads  and  writes  to  the  PBBM  and  then  writes  to  the  status  latch.  It  Is 
feasible  that  It  might  lose  mastership  of  the  bus.  In  the  event  that  the  new 
bus  master  Is  the  conq>anlon  Zoran  updating  Its  status,  the  original  Zoran, 
upon  regaining  mastership  of  the  bus,  will  write  a  status  nibble  to  the  latch 
which  Is  erroneous.  Vlhlle  the  chances  of  this  sequence  of  evunts  occurring  Is 
rather  slim,  such  an  occurrence  could  prove  fatal  to  a  process,  since  an 
Incorrect  reflection  of  processor  status  could  effectively  "lock-up"  a  bus. 
It  was  determined  that  this  design  for  the  status  latch  would  be  scrapped  In 
favor  of  a  different  design  i^lch  will  avoid  the  previously-discussed  hazard, 
require  only  a  single  write  to  update  status,  and  additionally,  use  less- 
expensive  components  In  Its  Implementation. 
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2.0  Brlsf  Duerlptlon  of  VFH  and  CFH  ArehltoeturM 

Much  of  the  developmental  history  of  EVA  has  been  given  In  Section  1  so 
that  one  could  have  an  appreciation  for  the  design  approach.  In  this  section 
the  reader  will  see  the  Influence  of  the  developmental  history  on  the 
Interfaces  among  the  EVA  fimctlonal  units  and  the  host.  As  stated  earlier, 
EVA  Is  composed  of  two  main  functional  units,  the  VFH  and  the  CFH 
architectures.  EVA  can  be  organized  to  expand  In  two  dimensions,  one  through 
adding  additional  VFH  boards  and  the  other  through  adding  additional  CFH  sub¬ 
systems.  Adding  additional  VFH  boards  Is  straightforward.  All  that  Is 
necessary  Is  a  simple  Insertion  In  the  VME  backplane.  However,  the  CFH 
expansion  uses  different  microprograms  that  share  the  conmon  data  buses.  It 
Is  even  possible  for  the  CFH  to  share  the  sente  cache  memory.  In  this  manner  a 
user  saves  two  additional  boards,  a  cache  memory  board  and  an  address 
generator  board.  But,  the  additional  cost  savings  should  be  compared  with  the 
larger  and  more  conqtlex  microprograms  needed  for  sharing  a  single  cache  memory 
space . 


The  design  philosophy  of  EVA  has  been  to  provide  a  user  friendly  system 
that  can  be  expanded  easily.  The  advantage  to  this  approach  Is  obvious.  The 
disadvantage  Is  the  Increased  system  complexity  of  a  very  general 
organization.  To  understand  the  organization  further,  the  following  sections 
describe  the  Interfaces  to  hosts  and  the  Internal  control  of  the  CFH.  Both  of 
these  high  level  views  will  aid  the  reader  In  comprehending  the  EVA  computer. 
The  following  paragraphs  quickly  outline  the  major  functional  capabilities  on 
each  of  the  boards.  Section  2.1  concentrates  on  the  multiple  CFH  Interfaces. 
Section  2.2  focuses  on  the  VFH  Interface  and  programming  model.  The  VFH,  as  a 
separate  unit.  Is  Intended  for  operation  In  any  computing  system  with  a  VME 
backplane.  Hence,  It  Is  Important  to  grasp  the  VME  Interface  capabilities  of 
the  VFH.  More  specific  descriptions  of  the  CFH  and  VFH  follow  In  Section  3 
and  are  useful  for  the  mlcroprogrammer. 

FHOCESSOR  BOARD  DESCRIFTION 

The  processor  contains  two  multipliers,  two  ALUs,  microprogram  storage 
memory,  a  crossbar,  a  register  file,  and  various  1/0  ports.  Many 
configurations  are  possible  by  using  different  Interconnections  between 
processors  and  combinations  of  processors  and  memory  banks.  Descriptions  of 
the  processor's  major  components  follow  now. 

ARITHMETIC  CCRIFOIIEHTS 

The  multipliers  and  ALUs  support  a  wide  range  of  number  formats.  These 
Include  32  and  64  bit  fixed-point,  single  and  double  precision  IEEE  floating¬ 
point,  and  DEC  F  and  6  formats.  Each  multiplier  has  a  throughput  of  20 
megaflops  for  all  nu]id)er  formats.  The  ALUs  each  have  a  throughput  of  40 
megaflops  for  all  nuiid)er  formats,  howwver,  the  bandwidth  of  the  buses  may 
limit  double  precision  throughput  to  20  megaflops.  Total  throughput  of  120 
megaflops  could  be  possible  with  a  single  processor  board. 
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mCROPROGB^  STORAGE  RAM 


The  processor  operates  on  a  50  nsec  Instruction  cycle.  Each 
microinstruction  is  192  bits  wide  by  two  phases  long.  Each  phase  is  like  a 
separate  Instruction  25  nsec  long,  although  they  are  always  selected  in  pairs, 
giving  a  50  nsec  instruction  cycle.  The  memory  is  16,384  deep.  That’s  16,384 
Instructions  by  2  phases  by  192  bits.  This  memory  can  be  written  to  through 
the  I/O  ports,  64  bits  at  a  time. 

RECONFIGURABLE  REGISTER  FILE 

The  register  file  has  64  double  precision  registers  organized  as  an  8  by 
8  array.  Four  Independent  ports  allow  high  speed  access  to  the  registers. 
Two  ports  are  write  only  and  two  are  read  only.  Each  port  has  Its  own  address 
anH  a  bandwidth  of  40  MHz .  Two  reads  and  two  writes  can  be  done 
simultaneously.  All  accesses  are  synchronous,  so  a  single  location  can  be 
both  read  from  and  written  to  In  the  same  instruction  cycle. 

The  register  file  also  has  four  different  modes  of  operation.  One  Is 
normal  RAM  access .  The  others  link  register  locations  Into  multiple 
pipelines.  Configurations  of  8  pipelines  8  deep,  4  pipelines  16  deep,  and  2 
pipelines  32  deep  are  possible.  V^en  configured  as  a  pipeline,  writing  data 
to  the  first  location  of  a  pipe  causes  all  data  In  that  pips  to  be  shifted  to 
the  next  register  location.  Data  may  be  read  out  from  any  stage  of  the  pipe. 

CROSSBAR  HETWORK 

All  arithmetic  components,  register  file  ports,  and  I/O  ports  are  linked 
by  an  extensive  crossbar  network.  Each  arithmetic  component  has  two  Input 
ports  and  one  output  port.  These,  along  with  external  I/O  ports  and  register 
file  ports,  have  a  dedicated  port  Into  the  crossbar.  This  allows  for  all 
possible  paths  to  occur  simultaneously.  All  paths  may  be  switched 
simultaneously  at  a  rate  of  40  MHz. 

I/O  PORTS 

The  processor  board  has  6  dedicated  Input  ports,  4  dedicated  output 
porta,  and  two  bidirectional  ports.  Each  port  Is  32  bits  wide  with  a 
bandwidth  of  40  MHz.  These  ports  may  be  used  to  link  the  processor  to  memory 
banks  or  link  multiple  processors  together  or  both. 

ADDRESS  GENERATOR  BOARD  DESCRIPTION 

The  address  generator  Is  a  specialized  processor  with  an  architecture 
optimized  to  generate  complex  sequences  of  addresses  for  various  vector  and 
matrix  operations.  This  will  offload  the  arithmetic  processor  and  allow 
higher  throughputs.  Microprograms  for  complex  routines  will  be  much  shorter 
and  easier  to  write.  The  address  generator  architecture  has  4  two  dimensional 
counters,  2  address  look  up  table  RAMs,  microprogram  storage  memory,  address 
output  ports,  a  register  file,  and  a  crossbar.  All  data  paths  and  components 
of  the  address  generator  are  16  bits  wide. 
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TWO  DIMENSIONAL  COUNTER 


Each  two  dlioens tonal  counter  contains  2  preloadable  up /down  counters, 
two  adders,  two  registers,  and  a  multiplier.  This  hardware  Is  designed  to  do 
array  subscript  expansion.  After  Initializing,  the  counter  can  simultaneously 
Index  up  or  down  the  rows  and  coltunns  of  an  array.  This  allows  many  complex 
routines  to  be  programmed  quickly  and  efficiently.  Each  of  the  four  counters 
can  be  used  to  access  a  different  array  or  vector  In  memory.  Three  of  these 
counters  contain  an  FFT  address  sequencer.  This  will  allow  various  types  of 
FFTs,  Including  two  dimensional  FFTs,  to  be  programmed  efficiently. 

ADDRESS  LOOK  UP  TABLE  RAMS 

These  RAMs  can  be  used  for  Indirect  addressing  or  for  storing  sequences 
of  addresses  too  complex  to  calculate  In  real-time.  Each  of  these  RAMs  are  16 
bits  wide  by  either  32k  or  64k  deep.  They  can  be  accessed  at  a  rate  of  20 
MHz. 


MICROPROGRAM  STORAGE  MEMORY 

The  size  of  this  memory  Is  16,384  Instructions  by  2  phases  by  188  bits. 
It  functions  the  same  as  the  processor’s  memory. 

MICROPROGRAM  SEQUENCER 

This  sequencer  gener<°t«>s  addresses  at  a  rate  of  20  MHz  to  be  used  to 
access  microprogram  memory  ..utd  provide  program  flow  control.  Both  relative 
and  direct  addressing  m^des  are  possible.  A  stack  of  4096  words  Is  used  for 
subroutine  calls  and  a  16  bit  counter  Is  provided  for  loop  counting. 

ADDRESS  OUTPUT  PORTS 

Three  18  bit  ports  are  provided  for  outputting  addresses.  Each  of  these 
ports  can  run  at  a  rate  of  40  MHz.  A  16  bit  microprogram  address  output  port 
Is  also  provided.  This  feature  allows  the  microword  of  the  address  generator 
to  be  coodilned  with  the  processor  and  memory  boards. 

REGISTER  FILE 

The  register  file  for  the  address  generator  Is  Identical  to  the  register 
file  for  the  processor.  Its  primary  use  Is  for  address  pipelining  and  storing 
pointers . 

CACHE  MEMORY  BOARDS 

The  cache  memory  Is  used  to  store  reasonably  large  amounts  of  data  for 
use  by  the  processor.  The  memory  Is  organized  as  two  banks  of  triple  ported 
static  RAM,  one  bank  for  real  data  and  the  other  for  imaginary  data.  In  each 
Instruction  cycle  one  complex  word  can  be  written  and  two  complex  words  can  be 
read  from  cache.  All  writes  occur  In  the  first  clock  phase  and  all  reads  In 
the  second.  This  eliminates  all  possibility  of  conflict.  A  slxigle  location 
can  be  read  from  and  written  to  In  the  same  Instruction  cycle. 
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The  cache  memory  hardware  conslats  of  memory  blocka.  Each  block  has  two 
banks  of  triple  ported  RAM.  Each  bank  Is  32  bits  wide  and  the  depth  Is 
dependent  upon  which  memory  modules  are  used.  Depths  of  4k(  16ky  and  64k  are 
currently  possible.  Each  cache  memory  board  has  space  for  two  memory  blocks. 

Memory  blocks,  via  software  control,  can  be  linked  together  Into  banks. 
Linking  can  be  achieved  both  vertically,  for  greater  depth,  and  horizontally, 
for  wider  word  width.  Two  blocks  can  be  linked  horizontally  for  64  bit  word 
width.  Any  nuinber  of  blocks  can  be  linked  vertically  for  a  bank  size  up  to 
256k  words.  Up  to  16  banks  can  be  configured  simultaneously,  however,  the 
processor  can  only  access  one  bank  at  any  Instant  In  time.  Banks  can  be 
toggled  or  paged  through  rapidly  and  any  bank  not  being  accessed  by  the 
processor  can  be  accessed  by  I/O. 

2.1  CPH  Intarfaea  Arehltaetura 

The  multiple  Interfaces  among  EVA  are  described  In  this  section, 
beginning  with  the  CPH.  This  Is  to  allow  the  reader  a  view  from  the  host 
computer's  perspective  and  lay  a  foundation  for  the  Intimate  hardware  details 
of  the  CPH  and  VPH  In  Section  3.  EVA  Is  primarily  Interfaced  to  a  host  via 
the  PC  Interface  or  ISA  bus.  Another  interface  was  planned  earlier  for  EVA 
with  a  DT  Connect  bus  but  this  proved  to  be  costly  to  the  VPH  board  space  and 
was  subsequently  not  Included  In  the  design.  However,  the  design  effort  is 
documented  In  the  next  section  for  completeness.  In  1990,  this  bus  appeared 
to  become  a  defacto  Industry  standard.  By  1992,  Its  popularity  faded 
Inhibiting  further  versatility  to  other  CPH  applications. 

2.1.1  CFB/PC  Interface 

STC  currently  uses  essentially  the  same  ISA  Interface  structure  for  both 
the  CPH  and  VPH.  Advantages  of  going  this  route,  as  opposed  to  using  very 
different  Interface  designs  as  was  originally  planned.  Include  lower  NRE  for 
the  ISA-end  cards,  since  only  a  single  board  design  needs  to  be  manufactured. 
Also,  the  low-level  ISA  drivers  are  the  same  for  the  CPH  and  the  VPH,  so  time 
In  software  development  has  been  realized.  Another  advantage  Is  the  ability 
to  Interconnect  the  CPH  and  VPH  through  the  conuon  interface.  This  would 
allow  for  some  development  of  a  CPH/VPH  coprocessing  system.  The  limited 
bandwidth  of  this  Interface  would  obviously  limit  the  usefulness  of  such  an 
Interconnection  in  any  real  application,  but  it  would  certainly  be  adequate 
for  fundamental  development. 

The  user  view  of  the  PC  Interface  Is  depicted  in  Figure  12.  In  that 
figure,  the  reader  can  see  that  the  Interface  is  comprised  of  a  set  of  FIFOs 
for  READS  and  WRITES.  Flags  are  available  In  a  status  register  to  monitor  the 
FIFO  contents.  Those  flags  Include  "almost  full"  and  "almost  eiiq>ty"  so  that 
very  general  device  drivers  can  be  used  for  the  EVA  coiiq>uter.  The  Interface 
can  also  be  Interrupt  driven  as  well  as  program  driven  and  Interrupt  flags  can 
be  foimd  therein. 
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A  parity  bus  transceiver  connects  to  the  PC  bus  so  that  even  and  odd 
parity  can  be  checked.  The  selection  Is  made  via  the  status  bits  In  the 
status  register  and  the  appropriate  driver  code.  Both  the  PC  and  EVA  ends 
must  observe  the  chosen  protocol.  For  physical  distances  greater  than  3  feet, 
the  RS-422  Interface  was  chosen.  Twisted  pair  shielded  cable  then  Insures 
noise  free  operation.  Programming  the  Interface  board  Is  described  In  Section 
3.2.6.  Also,  to  take  advantage  of  the  high  nature  of  DSP  applications,  the  VPH 
Intended  for  DSP  has  a  slightly  different  interface  on  its  end.  The 
differences  are  discussed  in  Section  3. 2. 6.1. 

2.1.2  DT  Comwet  Intmrfaea 

One  of  the  planned  Interfaces  for  the  VPH  was  a  DT  Connect  Interface. 
Inclusion  of  this  interface  would  allow  systems  to  be  Implemented  using  Data 
Translation’s  data  acquisition  boards  and  possibly  frame  grabbers  along  with 
the  VPH  as  the  processing  engine.  Such  a  system  might  be  desirable  In  light 
of  the  fact  that  Data  Translations  data  acquisition  products  appear  to  be 
competitive  in  terms  of  bandwidth,  etc.,  but  their  array  processors  aren't 
very  fast.  (The  DT7020  array  processor  appears  to  be  their  quickest 
processor.  This  unit  is  rated  at  8  Mflops  peak,  as  compared  to  around  120 
Mflops  peak  for  the  VPH.)  The  DT  Connect  interface  is  intended  as  a  high¬ 
speed  data  path  between  acquisition  devices  and  processors  which  are  in  close 
proximity  to  one  another. 

The  DT  Connect  Interface  is  very  loosely  defined.  The  definition 
consists  of  the  pinouts  on  the  connectors,  the  timing  for  the  darts  and 
asynchronous  handshake  lines,  and  the  electrical  handshaking  protocols 
implemented  with  the  handshake  lines.  Ho  limits  on  cable  length  are  stated, 
but  because  the  cabling  is  driven  with  conventional  TTL  drivers  such  as  the 
74ALS244  or  74AS244,  and  in  light  of  the  statement  in  the  specification  that 
the  data  can  be  clocked  at  something  over  10  MHz,  it  is  obvious  that  cable 
^®®6th  will  be  limited  to  about  30  cm.  This  limitation  could  pose  some 
serious  restraints  on  putting  together  a  system  using  Data  Translation 
acquisition  boards  and  a  VPH. 

The  DT  Connect  Interface  Is  available  only  on  Data  Translation’s 
products  aimed  at  PC/AT-based  systems.  The  need  for  a  high-speed  data  path 
between  acquisition  devices  and  processors  in  a  PC— based  system  Is  obvious  due 
to  the  limited  bandwidth  of  the  ISA  bus.  Data  Translation  did  the  obvious 

to  alleviate  this  problem  In  establishing  the  DT  Connect  pathway  between 
their  acquisition  and  processor  boards.  Because  the  VPH  will  not  be  on  an  AT 
form  factor  card,  the  usefulness  of  a  DT  Connect  interface  for  the  VPH  Is 
highly  questionable. 

As  stated  before,  a  practical  limit  on  cable  length  la  around  30  cm,  and 
this  Is  about  the  length  of  cable  that  would  be  needed  just  to  get  the  cable 
out  of  the  AT  case.  A  cable  long  enough  to  exit  the  AT  case  and  connect  to  a 
VPH  placed  close  to  the  AT  would  be  well  in  excess  of  the  30  cm  limit. 

A  number  of  possible  solutions  or  partial  solutions  to  this  obstacle 
present  themselves.  The  simplest  solution  is  possible  due  to  the  asynchronous 

of  the  DT  Connect  Interface.  The  VPH  end  of  the  Interface  can  easily 
govern  the  transfer  rate.  The  transfer  rate  could  therefore  be  limited  to 
ensure  reliable  data  transfer  to  occur  across  the  cable.  STC  estimates  that 
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the  data  clocking  rate  could  be  on  the  order  of  2  MHz  for  an  effective  data 
rate  of  4  MB/a.  This  data  rate  Is  lower  than  that  of  the  ISA  bus  Itself, 
which  makes  the  solution  seem  very  undesirable  In  light  of  the  fact  that  our 
existing  ISA  Interface  can  easily  meet  and  probably  beat  the  A  MB/s  rate  of 
exchange.  The  only  obvious  advantage  to  using  a  rate-limited  DT  Connect 
Interface  Is  that  the  ISA  bus  would  be  free  during  transfers,  which  would  In 
turn  allow  the  AT  to  be  performing  some  other  processing  task  concurrent  to 
the  data  transfer.  If  Data  Translation  software  were  being  used  on  the  AT, 
the  transfers  across  the  DT  Connect  Interface  could  most  likely  be  handled  as 
standard  DT  Connect  transfers  by  the  software.  On  the  other  hand.  It  Is 
questionable  whether  Data  Translation  software  could  handle  control  of  the 
VPH.  Depending  on  the  ability  of  their  software  to  link  In  user-generated 
routines  for  non-Data  Translation  system  components,  this  solution  might  be 
totally  unworkable  If  a  user  Intends  to  use  Data  Translation  software.  If  a 
user  Is  willing  to  write  all  the  software  for  driving  the  VPH  and  any  Data 
Translation  boards  present  In  his  system,  this  Is  a  possible  solution.  As 
stated  before,  the  penalty  In  speed  reduction  of  the  Interface  begs  the 
question  of  the  practicality  of  the  solution,  even  In  a  situation  where  the 
user  Is  willing  to  develop  necessary  software. 

Another  solution  which  seems  somewhat  more  practical  would  be 
development  of  a  combination  ISA/DT  Connect  Interface  for  the  VPH.  Such  an 
Interface  would  have  a  paddle  card  at  the  AT  end  which  would  plug  Into  the  ISA 
bus,  would  provide  DT  Connect  ports  Into  the  Interface,  and  provide  a 
connector  for  cabling  to  the  VPH.  A  number  of  advantages  to  such  a  scheme 
exist.  Including  the  likelihood  that  a  design  could  be  done  which  would  Impose 
much  less  significant  limitations  on  maximum  data  transfer  rates.  It  Is  also 
possible  that  such  a  scheme  might  allow  the  VPH  to  "look  like”  a  Data 
Translation  board  so  that  no  problems  would  occur  when  using  software  specific 
to  Data  Translation  systems. 

The  disadvantages  to  this  approach  are  primarily  centered  around  the 
Issue  of  development  time.  An  ISA  Interface  for  the  VPH  Is  already  In 
existence,  and  this  design  would  need  a  good  deal  of  modification  In  order  to 
be  made  compatible  with  both  ISA  and  DT  Connect.  An  additional  NKE  and 
manufacturing  charge  would  be  Incurred  for  production  of  the  AT  paddle  card. 
In  addition.  If  the  approach  of  making  the  VPH  look  like  a  Data  Translation 
board  were  taken,  a  great  deal  of  research  Into  protocols  and  architecture  of 
Data  Translation’s  processors  would  be  necessary.  It  might  prove  very  hard  to 
get  the  necessary  Information.  Also,  a  good  deal  of  additional  firmware 
development  would  be  necessary  If  the  VPH  were  to  emulate  a  Data  Translation 
processor.  Considering  these  points,  STC  doesn’t  believe  that  this  Is  a 
viable  solution. 

A  partial  solution  would  Involve  design  of  a  fairly  generic  high-speed 
Interface  for  the  VPH.  This  Interface  could  provide  the  ability  to  develop  an 
AT  paddle  card  to  provide  DT  Connect  translation  at  some  future  date.  In 
terms  of  development  costs,  this  seems  like  a  much  better  approach.  In 
addition,  such  a  generic  Interface  could  provide  the  ability  to  develop 
translators  for  any  number  of  other  buses  and/or  Interfaces  to  which  we  might 
want  to  connect  at  some  future  date.  STC  believes  that  the  existing  VPH-end 
ISA  interface  may  be  modified  to  provide  such  a  generic  interface. 
Modifications  might  Include  Increasing  the  width  of  the  I/O  data  paths  and 
Increasing  the  amoxmt  of  control  logic  In  order  to  make  the  adaptability  of 
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the  Interface  as  robust  as  possible. 

2.1.3  VFH/CFB  Interface 

A  nviniber  of  possible  methods  of  implementing  such  an  Interface  are 
possible,  and  the  best  solution  Is  an  "augmented"  VME  link  between  the  VPH  and 
CPH.  This  "augmented"  VME  link  utilizes  the  standard  32-blt  data  path  of  the 
VME  bus,  and  additionally  uses  32  of  the  user-definable  bits  on  the  bus  as 
additional  data  bits,  making  the  effective  width  of  the  link  64  bits.  This 
enables  a  maximum  data  transfer  rate  of  80  Mbytes/second  between  the  VPH  and 
CPH.  This  transfer  rate  stretches  the  limits  of  the  VPH,  and  requires  playing 
with  how  I/O  occurs  on  the  VPH  board  when  VPH/CPH  64-bit  transfers  are 
occurring.  This  high  data  transfer  rate  makes  the  added  circuitry  worthwhile, 
since  It  greatly  enhances  the  real-time  capabilities  of  a  CPH/VPH  system,  and 
effectively  cuts  the  required  number  of  required  bus  cycles  for  any  given 
VPH/CPH  transfer  In  half,  thereby  reducing  loading  of  the  host  bus. 

The  CPH  Is  also  equipped  with  a  VME  Interface  through  Its  VME  buffer 
board,  although  Its  VME  Interface  Is  some\dxat  more  rudimentary  than  that  of 
the  VPH.  This  allows  the  VPH  and  CPH  to  be  housed  In  a  common  enclosure. 
This  common  enclosure  actually  contains  two  separate  backplanes  -  a  VME 
backplane  and  a  proprietary  backplane  for  the  CPH  boards. 

In  previously  proposed  VPH  architectures,  the  VPH/VME  Interface  shared  a 
port  of  the  4  port  SRAM  with  the  ISA  Interface.  This  arrangement  allowed  for 
VME  communications  to  occur  transparently  as  far  as  the  68020  was  concerned, 
which  would  allow  the  020  to  do  simple  system  traffic  control  concurrently 
with  VME  transfers.  The  likely  kinds  of  traffic  control  that  might  be 
performed  during  VME  communication  would  necessarily  be  limited  to  such  things 
as  status  updating  or  polling  of  status  of  the  Zoran  processes.  A  limitation 
of  this  architecture  Is  that  the  VME  can  only  access  the  4  port  SRAM  space. 
Data  or  program  code  that  Is  being  transferred  Into  other  memory  areas  would 
need  to  be  transferred  out  of  SRAM  and  Into  the  actual  destination  by  the  020. 
This  puts  additional  demands  on  the  020  and  also  results  in  real  transfer 
times  being  Inflated  due  to  the  double  transfers  necessary. 

In  the  current  VPH  architecture  utilizing  the  MVME6000,  the  VME 
Interface  has  access  to  the  entire  VPH  address  space.  This  will  allow  the  VME 
Interface  to  access  data  In  any  section  of  memory  on  the  VPH  board.  Including 
the  memory  on  the  Zorans,  eliminating  the  need  for  020  transfers  from  4  port 
SRAM  to  actual  destinations.  The  VME  accesses  the  4  port  space  through  the 
020 *s  port  via  the  020  bus.  This  Imposes  the  limitation  that  while  VME 
transfers  are  occurring,  the  020  Is  essentially  locked  out  and  can’t  perform 
any  local  processing  tasks.  This  limitation  Is  of  only  small  consequence, 
especially  when  balanced  against  the  elimination  of  double  transfers  that 
require  020  control. 

Transfers  between  the  VPH  and  CPH  are  performed  by  using  the  VME 
standard  32-blt  data  path  and  using  32  of  the  user-configurable  bits  to  widen 
the  effective  data  width  to  64  bits.  The  additional  32  bits  of  data  are 
written  to /read  from  the  ISA  port  of  the  4  port  SRAM.  This  allows  for  data 
transfer  rates  far  In  excess  of  the  bandwidth  of  a  single  port  Into  SRAM 
(about  50  MB/s)  and  effectively  doubles  the  stated  VME  bus  specification  of  40 
MB/s  maximum. 
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Another  advantage  of  the  new  VPH  architecture  using  the  MVME6000  Is  that 
the  VPH,  by  virtue  of  the  capabilities  of  the  6000  chip,  may  be  used  as  a  VUE 
system  controller.  This  Is  likely  to  have  a  large  Impact  on  marketability. 
The  VPH  Is  able  to  be  a  system  controller,  which  will  allow  use  of  standard 
VME  system  components  such  as  memory  and  data  acquisition  boards  to  function 
under  VPH  control,  without  the  need  for  an  expensive  VME  host  system  or  VME 
controller.  This  could  be  very  attractive  to  anyone  \dxo  needs  the 
capabilities  of  a  VPH  but  doesn’t  have  a  VME  host  system.  It  could  also  be 
attractive  to  anyone  who  does  have  a  VME  host  system,  but  would  like  their 
vector  processor  to  be  able  to  master  the  system. 

In  terms  of  the  Immediate  goals  of  this  project,  the  new  architecture 
has  a  nvunber  of  advantages.  Prlmairy  among  these  advantages  Is  the  ability  to 
configure  a  CPH/VPH  system  idilch  does  not  require  a  VME  host  system.  With  the 
ISA  Interfaces  resident  on  both  the  CPH  and  VPH,  a  very  powerful  processing 
station  may  be  configured  with  a  CPH,  a  VPH,  a  good  ISA  machine,  and  the 
previously  described  backplane  and  enclosure.  A  wide  variety  of  off-the-shelf 
data  acquisition  and  Interfacing  boards  are  available  for  VME,  so  Interfacing 
such  a  CPH /VPH /ISA  system  to  virtually  any  type  of  sensors  or  other  data 
sources  should  be  relatively  straightforward.  Unusual  or  highly  specialized 
Interfacing  applications  are  handled  by  an  appropriate  VME-compatlble 
Interface  board  (the  VME  buffer  board  In  Section  3.2.5). 

2.2  VPH  Arehlteeture 

The  Vector  Processor  hardware  or  VPH  consists  of  A  Zoran  325  DSP  devices 
and  a  68020  floating-point  processor  configured  to  perform  DSP  operations  In  a 
wave  fashion.  The  68020  can  operate  Independently  of  the  DSPs.  The  VPH  is  a 
single  board  in  a  9D  VME  quad  high  footprint.  It  can  Interface  to  a  9U  or  60 
VME  platform.  A  MVE  6000  master  slave  controller  device  on  the  VPH  assists 
data  transfer  across  VME  systems. 

The  VPH  block  diagram  Is  shown  In  Figure  13.  Here,  one  can  see  that  the 
DSPs  and  the  68020  talk  to  a  4-port  SRAM  from  data  and  program  storage.  A  PC 
Interface  Is  also  provided  for  code  development  and  system  monitor.  The  PC 
Interface  Is  a  fast  parallel  port  data  transfer.  For  6U  VME  transfer  an 
additional  VME  buffer  board  Is  provided.  The  VPH  Is  Intended  to  be  plug 
con^atlble  with  the  SUN  workstations  to  enhance  Intensive  numerical 
computations  via  a  set  of  provided  math  libraries. 
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QATEWAV  GATEWAY 


2.2.1  ISA  Intarfae* 


An  Important  task  of  the  VPH  project  has  been  the  study  of  the  host-to- 
VPH  Interfaces.  Two  such  interfaces  are  possible  -  the  primary  VMS  interface 
and  a  secondary  ISA  interface.  The  ISA  Interface  will  allow  the  VPH  to  be 
configured  into  a  PC/AT  system.  This  allows  development  of  a  variety  of  VPH 
software  without  the  need  for  access  to  a  VMS  machine.  The  ISA/ VPH 
condilnatlon  is  not  a  very  efficient  way  in  idilch  to  utilize  the  VPH  due  to  the 
limitations  of  the  ISA  bus,  but  should  prove  convenient  for  development 
purposes,  and  may  even  be  useful  for  some  applications.  A  number  of 
extensions  to  the  VME  standard  are  also  in  existence.  These  extensions  are 
designed  to  Improve  certain  aspects  of  VME  system  performance,  and  to  add 
flexibility  to  the  VME  bus.  These  extensions  were  examined  to  see  if  any  of 
their  features  are  suited  to  the  VPH.  The  two  extensions  were  examined:  the 
VSB  bus  (VME  Subsystem  Bus)  and  the  VXI  bus  (VMEbus  extensions  for 
Instrumentation) . 

The  ISA  Interface  is  realized  as  a  block  of  four  8-bit  I/O  ports  on  the 
ISA  side  of  the  Interface.  Two  of  these  ports  form  a  16-bit  data  port  into 
the  VPH,  while  the  other  two  ports  form  a  16-hlt  control/status  register 
through  which  the  VPH  may  relay  status  information  to  the  ISA  host.  Also,  the 
ISA  host  through  this  same  port  gives  control /comaand  Information  to  the  VPH. 
The  basic  command  set  Includes  Block  Transfers  to/from  the  VPH,  Block  Moves 
between  memory  domains  within  the  VPH,  BESET  of  the  VPH  subsystem,  and 
commands  to  the  68020  to  begin  execution  of  internal  code.  The  VPH  will  be 
capable  of  interrupting  the  ISA  host  to  indicate  task  completion.  The 
interrupt  level  used  is  user-selectable  In  order  to  configure  the  VPH  into 
most  ISA  systems  without  creating  conflicts  with  other  boards.  The  VPH  also 
posts  task  status  in  the  status  register  area  so  that  the  host  may  poll  this 
register  to  look  for  task  completion,  rather  than  being  interrupted.  This 
could  be  handy  in  some  applications,  but  the  main  reason  for  this  feature  is 
to  prevent  Interrupt  conflicts  in  ISA  systems  that  have  other  resources  using 
all  available  user  Interrupts  (This  is  a  typical  problem  with  ISA  systems.). 

The  ISA  host  is  capable  of  Interrupting  the  68020  to  initiate  transfers 
of  data  and/or  commands,  or  it  may  poll  the  status  registers  to  see  if  the  VPH 
is  in  an  "idle"  state  which  will  allow  the  host  to  effect  various  operations 
by  setting  specific  bits  in  the  conmand  register. 

Transfers  of  data  to  the  VPH  is  accomplished  with  the  help  of  a  16-blt 
presettable  up/down  counter  in  the  Interface  :dilcb  will  allow  transfer  of  data 
to  contiguous  locations  in  the  4-port  SRAM  with  a  single  address  being  passed 
to  define  the  starting  point  for  the  block  transfer.  This  allows  for  the 
maximum  possible  data  rates  between  the  host  and  VPH. 

The  VPH  interface  is  mapped  into  the  ISA  l/O  space  rather  than  PC  memory 
space.  The  interface  is  essentially  a  contiguous  block  of  four  I/O  locations. 
These  locations  will  be  user-selectable,  since  add-on  cards  use  a  wide  variety 
of  the  available  l/o  addresses.  Because  of  the  fact  that  a  block  of  only  four 
locations  will  be  required  by  the  VPH,  a  user  should  have  no  problem 
successfully  configuring  the  VPH  into  a  system.  This  requirement  of  four 
contiguous  locations  is  small  when  compared  to  most  add-on  cards  -  even 
■o®*thing  as  simple  as  a  serial  port  typically  requires  eight  contiguous  I/O 
locations. 
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The  I/O  mapped  approach  does  not  allow  the  ISA  host  to  read  or  write  a 
specific  location  in  a  single  transaction  cycle  since  the  address  bus  won't  be 
available  to  the  Interface.  The  address  and  data  must  be  passed  in  two 
separate  cycles.  Rates  of  data  transfer  could  be  seriously  impacted  by  this 
requirement.  This  problem  is  solved  by  giving  the  interface  the  ed>ility  to 
provide  incremental  addresses  for  accessing  contiguous  locations  in  the  SRAM, 
eliminating  the  need  for  the  host  to  provide  an  address  for  every  word 
transferred.  This  has  virtually  eliminated  the  potential  performance  penalty 
of  the  I/O  mapped  approach,  since  the  vast  majority  of  transfers  consist  of 
blocks  of  data  rather  than  individual  %fords. 

2.2>2  VIB  Interface  to  VFH 

In  VMS  Interface  design  Investigations,  STC  determined  that  one  feature 
the  VFH  VHE  Interface  must  have  Is  the  ability  to  perform  VMS  block  transfers. 
This  will  allow  the  highest  data  transfer  rates  possible.  Designing  this 
ability  Into  the  Interface  provided  some  challenges. 

In  order  to  perform  block  transfers,  the  interface  must  Include  an 
address  counter  for  accessing  the  SRAM.  This  In  Itself  Is  no  real  problem.  A 
state  machine  must  he  designed  vdilch  clocks  (Increments)  the  counter  at  the 
appropriate  times  within  the  block  transfer.  In  addition,  this  state  machine 
must  generate  the  AOl  address  bit  to  the  SRAM  address  decoder,  since  this  bit 
is  only  valid  on  the  first  transfer  cycle  of  a  block  transfer. 

A  block  transfer  begins  with  a  normal  byte,^word,  or  longword  transfer. 
The  transfer  becomes  a  block  transfer  If  the  DSO  and  DSl  data  strobes  are 
released  and  then  reasserted  without  a  negation  of  the  AS  address  strobe  In 
between.  Once  a  block  transfer  has  begun  as  described,  the  address  strobe 
remains  asserted  until  the  block  transfer  is  con^lete,  with  Individual 
transfers  being  delineated  by  negation  of  both  data  strobes. 

Once  a  block  transfer  has  begun,  the  LWORD*  and  AOl  and  A02  -  A3I  bits 
from  the  VMEbus  are  Invalid.  They  are  valid  only  for  the  first  cycle  of  a 
block  transfer.  The  Initial  value  of  these  bits  sets  up  the  block  transfer, 
and  on  subsequent  transfers  the  Interface  circuitry  must  supply  a  valid 
address  and  hold  the  LWORD  value  which  existed  during  the  Initial  transfer 
cycle. 


The  state  machine  to  perform  these  functions  would  seem  at  first  glance 
to  be  relatively  straightforward,  but  It  was  discovered  that  the  machine  is 
not  easy  to  Implement  In  any  sliiq>le  way  and  still  be  able  to  keep  up  with  the 
timing  requirements  for  maximum  throughput.  STC  uses  a  design  for 
Implementing  the  state  machine  In  a  single  20RA10  PAL. 

The  MVME6000  Is  designed  for  Interfacing  68020/30  processors  to  the  VIS 
bus.  An  analysis  of  this  chip's  specifications  shows  that  the  chip  has  a  wide 
range  of  functionality.  With  only  a  small  handful  of  additional  logic,  the 
MVME6000  may  be  used  to  create  a  VME/ 680x0  interface  which  conforms  strictly 
to  the  VME  bus  specification,  and  ^idiich  Includes  all  VME  functions  except  BLTs 
(block  transfers).  Including  all  master/slave/system  controller  capabilities. 


44 


2.3 


of  Intorfoeos 


The  EVA  computer  Is  comprised  of  several  functional  tinlts  each  of  vhlch 
have  multiple  Interfaces.  Because  of  the  versatile  communication  paths,  the 
previous  sections  centered  on  those  available  to  a  ^lser.  Two  boards  serve  as 
multiple  Interfaces.  They  are  the  lOP  board  which  Interfaces  the  CPH  modules 
to  the  host,  and  the  VME  Buffer  board  which  Interfaces  to  the  CPH,  VPH,  and  a 
6U  VUE  backplane  so  that  the  CPH  can  communicate  to  a  VME  system.  They  are 
now  listed  for  clarity. 

Interface  Board  Description 

PC  to  VPH  VPH  daughterboard  VPH  end  of  this  Interface, 

see  Sections  2.2.1,  3.2.6. 1 

PC  to  CPH  lOP  60  board  plugs  Into 

CPH  backplane,  see  Sections 
2.1,  2.1.1,  3.2.4,  3.2.6 

PC  to  ISA  PC-INT  ISA  bus  board  plugs  Into 

286  and  386,  see  Sections 

1.1. 2. 3  and  2.1.1 

VPH  to  CPH  VME  Buffer  60  board  plugs  Into  CPH 

backplane,  see  Sections 

2.1.3  and  3.2.5 

VME  to  VPH  VPH  Integral  part  of  VPH 

board,  see  Sections  2.2.2 

VME  to  CPH  VME  Buffer  same  board  used  to  Interface 

to  CPH  to  VPH  and  also  called 
SIO  or  Serial  10  board,  see 
Sections  3.2.5 

Internal  CPH  HSIO  high  speed  10  bus  that 

communicates  among  the  CPH 
modules  (processor,  AG,  lOP, 
cache  memory) ,  see  Section 
3.2.7 
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3*0  Thaory  of  Oporatlon 

With  this  Introduction  to  interfaces,  the  theory  of  operation  section 
describes  the  remaining  architectural  details  of  EVA.  Section  3.1  starts  with 
the  VPH  and  Its  Internal  register  resources.  Operating  the  VPH  will  require  a 
thorough  understanding  of  the  VPH  to  VME  interface.  Hence,  the  programmer’s 
model  and  the  VPH  address  map  are  presented  so  that  a  programaer  may  know 
which  addresses  on  the  VME  bus  correspond  to  Internal  VPH  resources.  The 
address  map  Is  presented  early  because  addresses  for  the  68020  are  different 
than  those  for  the  DSPs.  (The  DSPs  are  designed  by  the  manufacturer  to 
address  words.  The  68020  can  address  bytes.)  They  are  numerous  and  Include 
control,  status  registers,  two  program  RAMs  or  PRAMs  1  and  2,  a  4-port,  and 
68020  registers.  Section  3.2  covers  the  CPH  and  Its  resources,  again  very 
numerous  Including  the  processor,  cache,  address  generator,  lOP,  and  VME 
Buffer.  Because  some  boards  (e.g.  the  VME  Buffer  board)  serve  multiple 
functions.  It  will  be  necessary  to  return  to  earlier  sections  at  times.  The 
versatility  of  EVA  Is  evident  In  Its  many  Interfaces  and  operating  modes. 
Those  operating  modes  Include  VPH  In  VME  systems  (such  as  the  TSl  tracker), 
CPH/VPH  as  EVA,  and  CPH  In  VME  systems.  Mote  that  the  VME  buffer  board 
allows  the  CPH  to  be  hosted  by  a  system  other  than  a  PC. 

The  previous  sections  described  the  general  architecture  and  Interfaces 
of  the  CPH  and  VPH.  With  this  Introduction  It  is  now  possible  to  discuss  the 
operation  of  both  In  more  detail.  The  following  sections  begin  with  a 
description  of  the  VPH  resources  and  end  with  those  of  the  CPH.  In  the 
process,  additional  architectural  hardware  details  are  presented  as  needed. 
These  are  accompanied  by  the  microinstruction  format  and  machine  definition 
file  for  the  CPH  found  In  the  appendices.  To  understand  the  theory  of 
operation  of  each  functional  unit  It  will  be  necessary  to  know  much  about  the 
Individual  address  spaces,  control  signals,  and  assembly  language,  and 
microinstructions  of  the  lOP,  CPH,  VME  buffer  board  and  PC  Interface  board. 
Such  Information  Is  also  presented  In  this  Section. 

3.1  VPH 

The  VPH-20  Is  a  multi-processor  DSP  board  suited  to  FFTs,  FIR  and  HR 
filters,  spectrum  analysis,  Kalman  (and  other)  adaptive  filters,  and  numerous 
other  DSP  tasks.  The  VPH-20 ’s  processing  power  comes  from  four  Zoran  ZR34325 
Vector  Processor  chips  (arranged  two  chips  on  each  of  two  buses)  and  one 
Motorola  68020  microprocessor.  The  VPH-20 *s  unique  architecture  allows 
concentration  of  all  processors  on  a  single  task  for  the  highest  processing 
speed,  or  partitioning  of  the  processing  resources  to  handle  multiple 
simultaneous  tasks.  The  VPH-20  performs  a  1024-polnt  complex  FFT  In  as  little 
as  604  us  at  20  MHz  (483  us  at  25  MHz). 

The  form  factor  of  the  VPH-20  is  a  standard  9D-4H  (366.7  X  340.0  mm) 
board.  This  Is  the  standard  VXIbus  "D''-slze  board.  The  VPH-20  requires  a 
single  slot  In  the  VME/VXI  backplane  unless  the  optional  PC  Interface 
daughterboard  Is  attached.  In  which  case  two  slots  are  required.  The  VPH-20 
may  be  used  In  any  environment  where  a  standard  VMBbtis  Is  In  existence. 
Including  VXl  systems.  Since  none  of  the  user-definable  pins  are  used  by  the 
VPH-20,  it  may  be  used  In  many  systems  which  are  based  on  a  VMBbus  with 
extensions,  such  as  Siu  Microsystems. 
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Integral  to  the  VPH-20  Is  a  standard  VME  bus  Interface  vhlch  allows  the 
VPH-20  to  operate  in  either  Master  or  Slave  modes.  The  system  may  also  be 
configured  as  a  VME  System  Controller  board.  The  system  architecture  allows 
for  transactions  to  occur  on  the  VME  bus  without  Interfering  with  signal 
processing  operations. 

An  optional  high-speed  PC  Interface  allows  the  VPH-20  to  be  tied  to  any 
standard  PC/AT-compatlble  computer.  This  Interface  may  be  used  In  conjunction 
with  the  VME  Interface,  allowing  a  PC  to  be  used  for  any  number  of  purposes 
such  as  process  monitoring,  data  display,  etc. 

nrnntfliiiTi'n  model 

A  brief  discussion  of  the  system  architecture  Including  a  system  memory 
map  and  the  programmer’s  model  follows.  Documents  ^Ich  may  be  of  additional 
help  Include: 

32-B±t  MicroorocesBor  Deer's  Manual 
Motorola  #MC68020UM/AD 

ZR34325  32-Blt  Floatlnss-Point  Vector  Signal  Proceaaor 
Zoran  Corporation  #DS34325-0989-1.5K 

MVME6000  VMEbus  Interface  User's  Hanual 
Motorola  #MVME6000UM/D1 

The  VPH-20’ 8  four  vector  processors  are  arranged  with  one  pair  of 
processors  on  each  of  two  local  buses.  Each  bus  has  32k  longwords  of  high¬ 
speed  static  RAM  (SRAM)  for  the  use  of  the  two  vector  processors  the  bus 
serves.  In  addition,  each  VSP  bus  may  access  one  port  of  the  system’s  four- 
port  SRAM.  This  four-port  SRAM  Is  a  memory  resource  which  Is  common  to  all 
system  resources;  the  use  of  such  a  memory  area  allows  multiple  resources  to 
access  the  same  memory  area  simultaneously  and  without  conflict  -  a  single 
memory  location  may  be  read  from  each  of  the  four  ports  at  the  same  time.  The 
size  of  the  four-port  SRAM  Is  4k  longwords. 

Another  resource  common  to  all  five  processors  Is  a  status  latch  which 
provides  a  simple  means  of  providing  for  primitive  semaphore  communication 
between  processors.  Each  processor  may  write  two  status  bits  to  the  status 
latch;  a  read  of  the  latch  yields  the  eight  status  bits  from  the  other  four 
processors . 

The  68020  has  access  to  all  system  resources.  Including  the  local 
memories  on  each  of  the  VSP  buses  and  the  Internal  registers  of  the  four  VSP 
chips  themselves.  The  VSPs  have  access  only  to  their  local  memory,  the  global 
status  latch,  and  the  four-port  memory.  All  off-board  comminlcatlon  Is 
handled  by  the  68020. 

The  VMEbus  Interface  Is  based  on  the  Motorola  MVtS6000  Interface  chip. 
This  versatile  arrangement  allows  the  VPH-20  to  function  In  the  Master  or 
Slave  modes,  and  also  allows  the  VPH-20  to  be  configured  as  the  VME  system 
controller.  The  VPH-20  may  access  the  entire  32-blt  VME  address  space.  The 
VPH-20’ s  location  In  the  VME  address  space  Is  user-configurable  over  a  wide 
range. 
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The  optional  PC  Interface  allows  the  VPH-20  to  conmunlcate  with  any 
PC /AT-compatlble  machine.  The  PC  Interface  Is  designed  to  provide  much  faster 
communication  between  the  PC  and  the  VPH-20  than  could  be  achieved  with 
conventional  serial  or  parallel  communication  techniques t  thereby  making  the 
PC  a  handy  and  useful  addition  to  a  system  utilizing  the  VPH-20. 

An  examination  of  the  Programmer’s  Model  diagram  In  Figure  14  shows  that 
there  remain  two  resources  not  yet  discussed.  The  DSACK  Generator  handles  the 
task  of  terminating  68020  bus  cycles  at  the  appropriate  time.  Its  operation 
Is  normally  transparent  to  the  user,  and  need  not  be  considered  In  most 
situations.  The  Expansion  Bus  allows  for  the  addition  of  any  of  a  number  of 
68020-conq>atlble  subsystems,  such  as  A/D  and  data  acquisition,  etc.  Any 
resource  idilch  Is  "tacked  on"  to  the  system  expansion  bus  will  have  Its  bus 
cycles  terminated  by  the  DSACK  generator  according  to  values  loaded  Into  the 
DSACK  RAM.  These  values  define  the  cycle  times  (wait  states)  necessary  for 
addresses  within  the  region  of  the  68020  address  space  reserved  for  system 
expansion  (the  upper  2  Gbytes). 
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The  following  68020  address  map  shows  where  the  various  resources  reside 
in  the  68020  address  space. 


A  more  detailed  discussion  of  individual  resources  follows. 
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Each  Zoran  bus  (or  VSF  bus)  serves  tvo  Zoran  VSF  chips  ar.d  a 
longword  area  of  local  SRAM.  In  addition,  each  VSF  bus  has  a  por"  Int.^  the 
four-port  memory  and  can  access  the  global  status  latch.  The  following  VSF 
address  map  shows  the  location  of  resources  as  seen  by  any  one  of  the  VSF 
chips . 


Hex  Address 

1  Resource 

0000  -  7FFF 

1  Local  SRAM 

2  0000  -  2  07FF 

1  Four-port  SRAM 

4  0000 

1  Global  Status  Latch 

Note  that  VSF  addresses  2  OOOOh  -  2  OFFFh  correspond  exactly  with  68020 
addresses  8  OOOOh  -  8  3FFFh  for  both  VSF  buses.  In  addition,  VSF  Bus  1 
addresses  OOOOh  -  7FFFh  correspond  to  68020  addresses  10  OOOOh  -  11  FFFFh  and 
VSF  Bus  2  addresses  OOOOh  -  7FFFh  correspond  to  68020  addresses  18  OOOOh  -  19 
FFFFh.  The  reason  for  the  apparent  difference  In  address  ranges  between  the 
68020  and  the  VSFs  Is  due  to  their  respective  methods  of  addressing.  The  VSFs 
can  only  access  longword  memory  locations,  \diereas  the  68020  can  access 
Individual  bytes.  The  6802C  then  has.  In  effect,  two  more  least  significant 
address  bits  than  the  VSFs.  The  difference  in  address  ranges  and  their 
locations  is  very  Important  to  the  programmer.  The  following  table  should  be 
of  assistance  In  converting  the  addresses  of  common  resources  between  the 
various  buses;  the  progratmner  shovild  thoroughly  familiarize  himself /herself 
with  this  table. 
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To  Coovort 
TSPt 


To  680201 


Uao  Foxaulot 


Bus  A -port 
Address 


A -port 
Address 


(VSP  addr.  -  2  0000h)*A  +  8  OOOOh 


Bus  1  SRAM 
Address 


Address 

Address 


(VSP  addr.)*A  +  10  OOOOh 
(VSP  addr.)*A  +  18  OOOOh 


Bus  2  SRAM 
Address 


To  Convort 
680201 


To  TSPt 


Ueo  Fornulat 


A-port 

Address 


Bus  A-port 
Address 


(020  addr.  -  8  OOOOh) /A  -i-  2  OOOOh 


Bus  1 
Address 


Address 


(020  addr.  -  10  OOOOh) /A 


Bus  2 
Address 


Address 


(020  addr.  -  18  OOOOh) /A 


3.1.1  TPH  Intornal  Control 

It  is  important  to  know  how  internal  controls  operate  on  the  VPH  since  a 
user  will  be  coding  directly  to  Zoran  status  latches,  Zoran  program  memory 
space  (PRAMs  1  and  2),  and  the  A-PORT  SRAM.  The  following  information 
describes  address  and  status  latch  maps. 

To  write  to  DSACR  SRAM: 

Write  to  any  address  such  that  A[31. .29]'>[001] .  This  disables  address 
buffers  and  allows  access  to  the  OSACK  SRAM,  which  is  addressed  with  the 
vector  A[31,2A. .18] . 

Write  to  any  address  such  that  A[31.  .29]-[011]  to  disable  DSACR  SRAM 
load  mode  and  re-enable  address  buffers. 

To  galn/relinqulsh  control  of  the  VME  bus: 

Write  to  any  address  such  that  A[31.  .22,20. .  18]-0,  A[21]-[l],  and 

A[2,l]«[01]  to  request  mastership  (byte  or  word  access. 

Relinquish  the  VME  bus  by  reading  A(31 . .22,20. . 18]-0,  A[21]-[l],  and 
A[2,l]>[01]  (byte  or  word  access). 

Status  latch  access: 

The  68020  may  access  the  status  latch  at  address  A[31..21]-0, 
A(20. . lO]*] 111] ,  A[2]'>0.  When  reading  the  latch,  the  bit  pattern  is: 
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D[7]  D[6]  I  D[5]  D[4]  |  D(3]  D[2]  |  D[l]  D[0] 


Bits  from  |  Bits  from  |  Bits  from  |  Bits  from 

Zoran  #4  j  Zoran  #3  j  Zoran  #2  j  Zoran  #1 

In  addition,  D[27..16]  reflect  PC  Interface  status  bits  STAT-[11..0] 
^en  the  Interface  Is  on  board. 

When  writing  to  the  latch,  the  bit  pattern  Is: 

D[I]  D[0]  (All  other  bits  are  don’t  cares.) 


Bits  from 
68020 

The  68020  may  send  RESET  commands  to  any  of  the  Zorans  by  writing  to 
A[31..21]-0,  A[20. .18]-[111] ,  A[2]-l.  The  bit  pattern  Is: 

D[3]  I  D[2]  I  D[l]  I  D[0]  (All  other  bits 

are  don’t  cares.) 


Zoran  |  Zoran  |  Zoran  |  Zoran 
#4  I  #3  I  #2  j  #1 

A  ’1’  written  to  one  of  these  bit  positions  causes  the  appropriate  Zcran 
to  be  reset  and  put  In  the  SLAVE  mode. 

PC  Interface  access: 

The  base  address  for  access  to  the  PC  Interface  Is  at  A[31. .22,20, 19]->0, 
A[21,19]~[ll] .  In  addition,  A[3,2]  are  used  to  access  specific  resources 
within  the  Interface.  All  accesses  to  PC  Interface  registers  are  longword 
accesses,  but  only  D[15:0]  are  used. 

To  read  or  write  the  FIFO,  A[3,2]-[00] . 

To  read  the  status  register  or  write  the  control  register,  A[3,2]-[01]. 
(The  status  register  may  also  be  read  by  reading  the  status  latch  as  described 
above . ) 

To  read  or  write  the  Interrupt  register,  A[3,2]-[l,0] . 
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PC  Interface  registers: 


Control  Register 


STAT  0  &  1  -  These  are  general  purpose  Interface  bits.  A  bit  written  to 
STAT  0  or  1  In  the  Control  Register  appears  as  STAT  0  or  1  In  the  Status 
Register  at  the  other  end  of  the  interface. 

SEND  -  This  bit  Is  an  enable  for  the  sending  of  data  across  the 
Interface.  A  0  written  to  this  bit  does  not  disable  the  ability  to  write  to 
the  output  FIFO,  but  does  prevent  data  In  the  output  FIFO  from  being  sent 
until  a  1  Is  written  to  this  bit. 

RECEIVE  >  This  bit  Is  an  enable  for  the  receiving  of  data  across  the 
Interface.  A  0  written  to  this  bit  does  not  disable  the  ability  to  read  data 
In  the  FIFO,  but  doos  prevent  the  FIFO  from  receiving  additional  data  until  a 
1  Is  written  to  this  bit. 

REP’^IT  -  A  1  written  to  this  bit  resets  the  entire  Interface.  The  FIFOs 
are  cleared,  zeros  are  written  to  all  bits  of  all  three  registers.  (This 
effectively  clears  the  RESET  command  once  It  has  been  effected.) 

CLK  0,1,2  -  These  bits  set  the  rate  at  which  output  data  Is  clocked 
across  the  Interface. 

0DD*/EVEN  -  This  bit  selects  odd  or  even  parity  across  the  Interface. 

NMSTIO  -  Setting  this  bit  makes  a  high  level  on  the  Incoming  STAT  0  the 
highest  priority  Interrupt,  thus  giving  the  PC  priority  over  any  VME 
Interrupts.  (The  level  of  the  request  as  passed  to  the  68020  Is  set  by  bit 
15.) 


ENINT  -  This  Is  an  enable  for  PC  Interrupts. 

CLRIRT*  -  A  1  written  to  this  bit  clears  all  PC  Interrupts.  The  bit  does 
not  self-clear,  so  a  0  must  be  written  to  this  bit  after  Interrupts  have  been 
cleared . 

LSEL0,1,2  -  These  bits  set  the  level  of  the  Interrupt  passed  to  the 
68020  In  response  to  a  PC  interrupt  request.  (A  request  via  the  STAT  0  line 
has  Its  Interrupt  level  set  by  bit  15  rather  than  by  these  three  bits . ) 
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STOILEV  -  This  bit  determines  the  interrupt  level  passed  to  the  68020 
(level  3  or  7)  in  response  to  a  PC  interrupt  request  on  STAT  0. 

Status  Register 


Interrupt  Mask  Register 


" 

R 

R 

R 

R 

U 

U 

U 

U 

P 

s 

S 

F 

F 

F 

F 

F 

F 

F 

F 

A 

1 

I 

A 

A 

F 

E 

A 

A 

F 

£ 

R 

A 

A 

X 

X 

X 

X 

X 

f 

E 

f 

E 

1 

I 

r 

T 

T 

1 

0 

1 

1 

1 

1 

1 

1 

9 

8 

7 

i 

S 

4 

3 

2 

1 

0 

5 

4 

3 

2 

1 

0 

3.1.2  VFB  Control  Signals 

When  performing  board  level  diagnostics  or  reprogramming  FALs,  the 
following  signals  may  be  needed.  They  are  listed  for  completeness.  Should 
future  WSMR  applications  call  for  functional  design  changes,  these  sources  of 
PAL  signals  will  assist  In  the  process.  The  device  and  signal  names  refer  to 
VPH  schematic  labels.  The  schematic  is  an  E-slze  drawing  (3’x4’)  and  Is 
provided  separately  from  the  Final  Technical  Report. 

A-PORT  SRAM 

68K  PORT  -  /0E2  GROUNDED 

/CE2  (FOR  EACH  BYTE)  FROM  4P0RTCS  PAL 
/WR2  FROM  U139  (BUFFERED  R/W) 

ZORAN  PORT  1  -  /0E4  FROM  ZDEC2  PAL  (  /CEl  OUTPUT) 

/CEA  FROM  ZDEC2  PAL  (  /CE2  OUTPUT) 

/VRA  FROM  ZDEC2  PAL  (  /URA  OUTPUT) 
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ZOKAN  PORT  2  -  /OEl  PRCW  ZDEC2  PAL  (  /CEl  ODTPDT) 

/CEl  FROM  ZDEC2  PAL  (  /CE2  OUTPUT) 

/WRl  mm  ZDEC2  PAL  (  /WRA  OUTPUT) 

BLT64  PORT  -  /OEl 
/CE2 
imi 

ZORAH  1  &  2  PRAM 

R/W  -  FROM  ZDEC2  PAL  (  /WRA  OUTPUT) 

/OE  -  FROM  ZDEC2  PAL  (  /OEA  OUTPUT) 

/CE  -  FRC»f  ZDEC2  PAL  (  /CE3  OUTPUT) 

68K  EPROM 

/OE  &  /CE  (FOR  EACH  BYTE)  FROM  68KMEMCS  PAL 

NOTE:  PIN  1  ON  EACH  EPROM  IS  SELECTABLE  VIA  JMPl  JUMPER  TO  BE  EITHER  +5V 
OR  AM  UPPER  ADDRESS  BIT.  THIS  ALLOWS  EITHER  128R  OR  256R  EPR(»1S  TO  BE  USED. 

68K  SRAM 

R/W  -  FR(»1  U139  (BUFFERED  R/W) 

/CE  &  /OE  >  (FOR  EACH  BYTE)  FROM  68KMEMCS  PAL 

ZORAH  BUS  ARBITRATION 


Arbitration  on  each  o£  the  2  Zoran  buses  is  handled  by  a  group  o£  4  PALs 
-  ZARB,  ZDECIL,  ZDEClHt  and  ZDEC2.  These  PALs  handle  generation  o£  all  control 
signals  related  to  operation  o£  the  bus,  including  processors,  memory  (both 
local  and  4-port),  and  status  latch.  RESET  is  not  handled  by  these  PALs. 

ZARB  PAL  -  This  PAL  handles  most  o£  the  bus  arbitration  £unctlons . 
Inputs  to  the  PAL  Include  Block  Select  signals  £or  the  Zorans  and  PRAM  on  the 
bus.  Bus  Request  signals  £rom  each  Zoran,  WRITE  signals  £rom  each  Zoran,  and  a 
R/W  signal  £rom  the  020. 

Outputs  include  Bus  Grant  signals  to  each  Zoran,  a  GEN  signal  which 
enables  the  020  to  Zoran  bus  transceivers,  ZDDIR  and  ZADIR  signals  ^Ich 
control  direction  o£  the  Zoran  address  and  data  bus  transceivers,  and  2 
quali£led  Block  Select  signals  which  are  used  by  other  control  circuitry. 

ZDEClx  PALs  -  These  PALs  provide  decoding  and  generation  o£  control 
signals  to  the  Zorans.  The  ZDECIL  PAL  handles  the  lower-numbered  Zoran,  the  H 
PAL  handles  the  higher-nusd>ered  one.  The  control  signals  these  PALs  handle  are 
the  Zoran  Chip  Selects,  Data  Strobes,  Reads  and  Writes,  and  the  Ready  signals. 

ZDEC2  PAL  -  This  PAL  handles  generation  o£  WRITE  and  Chip  enables  £or 
local  PRAM,  (Hilp  Enables  £or  the  4-port  SRAM  and  PRAM,  and  a  Status  Latch 
Enable . 
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DSACK  GEMERATOR 


The  DSACK  generator  handles  generation  of  DSACK  signals  to  the  020. 
These  signals  require  different  timing  for  the  various  different  memory  spaces 
in  the  system.  The  DSACK  generator  consists  of  2  PALs,  DS6EN  and  ROMPAL,  a 
small  SRAMi  and  a  switch  setup  for  setting  default  wait  cycle  lengths. 

RCmPAL  PAL  -  This  PAL  acts  as  a  9  X  4  KOU  containing  configuration  data 
for  8  blocks  of  memory.  A  G  output  serves  to  disable  the  020  address  and  data 
bus  buffers  idien  the  DSACK  generator  SRAM  is  being  loaded.  The  G  signal  also 
acts  as  an  input  to  the  DSGEN  PAL  for  correct  DSACK  generation  during  SRAM 
loading.  A  Write  Enable  is  output  to  the  DSACK  SRAM,  as  is  an  Output  Enable.  4 
configuration  bits  (CBITO-3)  are  output  to  the  DSGEN  PAL. 

DSGEN  PAL  -  This  PAL  handles  the  generation  of  the  actual  DSACKO-1 
signals  to  the  020. 

3.1.3  VFH  Conf  1  guratlon  Proeedurae 

There  are  a  number  of  hardware  and  system  level  considerations  to  take 
into  account  when  configuring  the  VPH.  The  following  sections  will  address 
some  possibly  critical  issues  and  outline  the  procedures  for  configuring  the 
VPH  hardware.  Switch  and  jumper  settings  will  be  treated,  as  will  "software" 
configuration  of  board  and  system  functions. 

3. 1.3.1  System  Controllsr  Sslaetlon 

In  a  VME  system,  slot  1  of  the  backplane  (usually  the  leftmost  slot  as 
viewed  from  the  front)  is  reserved  as  the  system  controller  slot.  The  board 
performing  the  system  controller  function  drives  the  VME  16  MHz  system  clock 
line,  the  LACK  daisy  chain,  and  the  BGO-3  daisy  chains.  The  system  controller 
also  provides  bus  arbitration  for  the  system. 

The  VPH  may  be  configured  as  either  a  standard  VME  board  or  as  the  VME 
system  controller.  This  is  accomplished  with  JMP2  on  the  VPH  board.  This 
jvimper  is  located  near  the  MVHE6000  chip,  which  is  the  one  with  the  cooling 
tower  on  it.  With  the  jumper  in  position  1  (shorting  pins  1  and  2)  the  board 
is  NOT  the  system  controller.  With  the  jumper  in  position  2  (shorting  pins  2 
and  3)  the  VPH  is  configured  as  the  system  controller. 

Configuration  of  the  board's  Vl£  bus  arbitration  module  is  necessary 
when  the  VPH  is  configured  as  the  system  controller.  A  discussion  of  how  to 
do  this  may  be  found  in  the  section  "LCSR  DESCRIPTION". 

Please  note  that  a  board  configured  as  the  system  controller  may  be 
positioned  ONLY  in  slot  1  of  the  VME  bsckplanei  a  VIS  system  may  be  comprised 
of  many  boards  but  only  the  board  in  slot  1  may  be  a  system  controller. 

3. 1.3.2  020  mOM  81m  Salaetloo 

The  VPH  is  designed  so  that  a  number  of  different  sizes  of  EPROMS  may  be 
used.  The  EPRCRfS  are  socketed  in  ZIP  sockets  for  ease  of  code  development. 
128,  256,  or  512  kbit  EPROMS  may  be  used  by  proper  setting  of  JMPl  and  JMP5, 
which  are  located  near  the  EPROMS.  The  table  below  indicates  proper  jumper 
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settings  for  each  of  the  three  EPEQM  sizes.  Note  that  position  1  indicates 
that  the  juiiper  is  shorting  pins  1  and  2.  position  2  indicates  that  pins  2  and 
3  are  shorted. 


EPR(»1  1 

Junker  Position 

Size  1 

(kbits) 1 

JMPl 

1 

JMP5 

512  1 

1 

2 

1 

2 

1 

256  1 

1 

1 

1 

1 

1 

2 

1 

128  1 

1 

1 

1 

1 

3.1.3.3  GC81  Baaa  Bddreea  Selaetloa 

The  GCSRs  (Global  Control  and  Status  Registers)  are  a  resource 
associated  with  the  VME  Interface.  This  group  of  8  registers  is  physically 
located  on  the  MVMEdOOO  chip.  A  detailed  description  of  the  GCSR  may  be  found 
in  the  aectlon  "GCSR  DESCRIPTION".  This  section  is  dedicated  to  setting  the 
GCSR  base  address. 

The  VPH  GCSRs,  as  viewed  from  the  VME  bus,  are  located  in  the  VlS’s 
Short  Supervisory  Access  space  (AM  code  $2D),  which  utilizes  16-blt  addresses. 
This  address  space  is  typically  partitioned  in  the  following  manner. 

The  upper  8  VME  address  bits  (A15>A8)  are  used  to  define  a  Group 
Address.  The  next  four  bits  (A7>A4)  are  used  to  address  a  board  within  a 
group.  The  lower  3  bits  (A3-A1)  are  used  to  address  a  specific  resource  of  a 
board  within  a  group.  This  partitioning  concept  isn't  hard  and  fast,  but  many 
boards  conform  to  this  structure.  The  VPH's  VME  Interface  GCSRs  are  located 
in  this  address  space,  and  configuration  is  necessary  to  position  the  GCSRs  at 
a  specific  location  in  the  short  1/0  space. 

The  GCSR  base  address,  referred  to  above  as  the  "group  address",  is 
determined  by  the  setting  of  SI  on  the  VPH  board.  This  switch  is  an  8-pole 
DIP  switch  located  next  to  the  top  edge  of  the  board.  The  lowest  bit  of  this 
switch  corresponds  to  VME  A8{  the  highest  bit  of  this  switch  corresponds  to 
A15.  A  switch  in  the  "on"  position  selects  a  zero  for  a  given  bit,  the  "off" 
position  selects  a  one. 

EXAMPLE:  To  set  the  GCSR  group  address  to  $8Dxx,  the  SI  switch 
settings,  from  highest  (Sl-8)  to  lowest  (Sl-1),  would  be: 

off  on  on  on  off  off  on  off 

The  (X!SR  board  address  is  configured  through  software  by  writing  the 
desired  value  for  A7-A4  into  the  register  at  an  offset  of  $1B  from  the  base 
address  of  the  LCSR.  (This  procedure  is  covered  in  the  section  "LCSR 
DESCRIPTICHI" . )  The  lowest  3  bits  (A3-A1)  are  decoded  by  the  MVMB6000  to 
access  one  of  the  8  registers  of  the  GCSR. 
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3. 1.3. 4  VUE  Slav*  Addrass  Modlflar  Coda  Salaetlon 


The  Address  Hodifier  (AM)  code  that  the  VPH’s  VIE  slave  vlll  respond  to 
Is  configured  through  a  combi-natlon  of  hardware  and  software  means.  This 
section  deals  primarily  with  the  hardware  configuration;  more  Information  on 
the  software  configuration  may  be  found  In  the  section  "LCSR  DESCRIPTION". 

Decoding  of  the  VME  AM  bits  Is  done  by  both  the  MVME6000  and  UI35  on  the 
VPH  board.  This  has  been  done  In  order  to  allow  more  versatility  In  mapping 
the  VPH  Into  the  VME  address  space  than  Is  allowed  by  the  MVME6000  alone.  A 
discussion  of  the  MVME6000*s  AM  decoding  may  be  found  In  the  section  "LCSR 
DESCRIPTION"  or  In  the  MVME6000  hardware  manual.  (Note  that  the  MVME6000 
always  sees  a  zero  on  AM4  regardless  of  the  level  actually  present  on  the 
bus.)  The  following  section  describes  the  decode  functionality  of  the  0135 
PAL;  two  versions  of  this  PAL  have  been  supplied  to  provide  two  different 
mapping  sets  for  the  VPH  VME  slave.  Information  contained  In  this  and  other 
sections  should  allow  creation  of  additional  PALs  to  provide  other  slave 
mappings . 

The  function  of  0135  Is  to  look  at  the  AM  code  present  on  the  VME  bus 
and  determine  If  the  AM  code  present  Is  correct  for  an  access  to  the  VPH’s  VME 
slave.  Uhen  a  valid  AM  code  Is  detected,  an  enable  signal  (MATCH32)  Is  passed 
on  to  the  MVME6000  to  enable  the  VME  slave.  The  MVME6000  then  re-quallfles 
the  AM  code,  with  AM4  presented  as  a  zero  regardless  of  the  level  on  the  bus. 
This  allows  the  VPH  slave  to  respond  to  the  VME  AM  codes  that  the  MVMB6000 
would  normally  reject. 

The  "MATCH"  version  of  U13S  maps  the  VPH  slave  to  one  of  the  normal  VME 
AM  code  sets.  In  order  to  enable  the  slave,  the  AM  code  must  have  the  upper 
two  bits  low.  The  lower  four  bits  are  compared  to  the  setting  of  the  switches 
on  S2  to  complete  the  decode.  S2-1  through  S2-4  correspond  to  AMO  through 
AM3,  respectively.  This  allows  the  slave  to  respond  to  the  AM  codes  In  the 
range  $00  through  $0F.  However,  within  this  group  of  AM  codes,  $00  through 
$08  are  reserved  as  Is  $0C.  The  MVME6000  can  not  be  made  to  respond  to  these 
codes.  In  addition,  the  MVME6000  Is  not  capable  of  block  transfers,  so  codes 
$0B  and  $0F  are  also  eliminated.  The  remaining  four  codes,  their  VME  transfer 
types,  and  the  value  that  must  be  loaded  to  the  MVME6000’s  LCSR  $0B  slave 
address  modifier  register  (020  address  $28000B)  are  summarized  below. 

AM  Code  I  VME  Transfer  Type  |  Register  Value 


$09 


Extended  Nonprlvlleged 
Data  Access 


ObXllZXOXl 


$0A 


Extended  Nonprlvlleged 
Program  Access 


ObXllXXOlX 


$0D 


Extended  Supervisory 
Data  Access 


OblXlXXOXl 


$0E 


Extended  Supervisory 
Program  Access 


OblXlXXOlX 


The  "MATCHA”  version  of  0135  allow  mapping  of  the  VPH  slave  Into  AM 
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codes  $10  through  $1F.  These  ere  "User  Defined"  address  regions.  Keep  In 
mind  that  since  the  MVME6000  always  sees  a  zero  on  AM4y  the  AM  code  seen  by 
the  MVHE6000  will  be  $10  less  than  the  value  actually  present  on  the  bus.  In 
order  to  ensure  response  from  the  MVME6000,  It  Is  recommended  that  only  codes 
$19,  $1A,  $1D,  and  $1E  be  used.  It  Is  possible  that  other  AM  codes  within 
this  block  would  be  acceptable  to  the  MVME6000,  but  this  would  have  to  be 
established  through  experimentation}  It  Is  easier  just  to  utilize  one  of  the 
four  prescribed  patterns.  These  AM  codes  and  the  Address  Modifier  Register 
values  are  sumnarlzed  below.  Note  that  all  VMS  transfer  types  are  actually 
"User  Defined"  -  the  transfer  type  shown  Is  the  type  assumed  by  the  MVME6000. 

AM  Code  I  Transfer  Type  |  Register  Value 


$19 


Extended  Nonprlvlleged 
Data  Access 


ObZllZXOXl 


$1A 


Extended  Nonprlvlleged 
Program  Access 


ObKllXXOlX 


$1D 


Extended  Supervisory 
Data  Access 


OblXlXXOXl 


$1E 


Extended  Supervisory 
Program  Access 


OblXlXXOlX 


Other  mappings  are  certainly  possible.  DO  NOT  ATTEMPT  TO  MAP  THE  VPH 
SLAVE  INTO  AMY  16-  OR  24-BIT  ADDRESS  SPACESI  The  VPH’s  address  decoders 
require  a  full  32-blt  address  even  though  most  of  Its  resources  are  located 
within  the  lower  24-blt  region.  An  attempt  at  mapping  the  slave  Into  a  16-  or 
24-blt  address  space  will  likely  result  In  system  failure,  since  the  upper 
address  bits  may  not  appear  as  expected.  (One  would  expect  the  upper  bits  to 
be  a  sign  extension  of  the  16-  or  24-blt  address,  ^Ich  for  most  24-blt 
accesses  would  work.  But  If  the  upper  bits  float  high,  or  If  the  sign  bit  Is 
a  "1",  accesses  would  fall.) 


New  design  files  for  DI35  could  be  created  easily  to  make  the  VPH  slave 
respond  to  any  of  a  group  of  AM  codes.  As  an  exaiiq>la,  a  possible  alternate 
design  file  Is  shown  below  idilch  would  allow  the  slave  to  respond  to  any 
combination  of  AM  codes  $19,  $1A,  $1D,  or  $1E.  (The  appropriate  value  loaded 
to  the  slave  address  modifier  register  would  depend  upon  the  selected  codes; 
OblilXXOll  would  work  for  any  selected  coiid>lnatlon  for  this  exaiiq>le.)  The 
function  of  S2  Is  shown  below. 


S2-1  Enable  accesses  on  AM  code  $19  when  "ON" 
S2-2  Enable  accesses  on  AM  cods  $1A  when  "(Ml” 
S2-3  Enable  accesses  on  AM  code  $1D  when  "ON" 
S2-4  Enable  accesses  on  AM  code  $1E  when  "ON" 


For  Instance,  to  allow  slave  access  on  codes  $1D  or  $1E,  turn  switches  1 
&  2  off,  switches  3  &  4  on.  The  following  PAL  fils  for  the  MAT(S  PAL  Is  vital 
to  future  changes  to  the  VPH.  It  Is  Included  (verbatim)  for  complete 
understanding . 
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ifausm  design  description 


I - Declaracion  Segment - 

TITLE  11ETCH32  AND  MATCHGCSR  DEC(H>ER  PAL 
PATTERN  MATCHB.PDS 

REVISION  00  ; 

AUTHOR  LARKT  HALL 

OWPANT  SPACE  TECH  CORP.  t 

DATE  07/31/92  ( 

1 


CHIP  MATCH  PAU2V10 


;  THIS  PAL  GENERATES  TW3  ENABLE  SIGNALS  NHICH  ARE 
:  USED  BY  THE  MVME6000  TO  DETEBMIRE  IP  AN  ADDRESS 
t  ON  THE  VME  BUS  BELONGS  TO  AN  ON-BOARD  RESOURCE, 
t  IT  ALSO  PERPOBMS  020  BUS  ARBITRATION  BETWEEN  THE 
{  020  AMD  THE  6000,  AND  PROVIDES  A  10  )BZ  CLOCK  FOR 
;  THE  6000  BY  DIVIDING  THE  20  MHZ  CLOCK  BY  TWO. 
t  /MATCCSR  INDICATES  THAT  THE  6000* S  GCSR  IS  BEING 
;  ACCESSED.  /MATCH32  INDICATES  THAT  THE  VME  IS 
I  ACCESSING  THE  VPH'S  32-BIT  ADDRESS  SPACE.  THE 
I  /MATCH  INPUT  IS  THE  OUTPUT  FROM  A  668  COMPARATOR 
i  WHICH  COMPARES  THE  A08-A1S  BITS  TO  A  VALUE  SET 
I  ON  AH  8-BIT  DIPSHITCH  WHICH  DEFINES  THE  ‘GROUP 
t  ADDRESS*  OF  THE  GCSR  IN  THE  VME  SHORT  ADDRESS 
t  SPACE.  CLK  IS  THE  20MHZ  CLOCK.  THE  B0-B3  INPUTS 
I  are  from  a  DIPSWITCB  used  To  define  the  am  CODE 
i  USED  TO  ACCESS  THE  VPH  FROM  THE  VME.  THIS  AM  CODE 
I  IS  REQUIRED  TO  HAVE  BIT  S  LOW  AND  BIT  4  HIGH.  THE 
I  ACCEPTABLE  AM  CODES  ABE  SUtRIARIZED  IN  THE  TABLE 
I  BELOW,  ALCHiG  WITH  THE  VME  BUS  SPEC’S  DEFINITION  OF 
I  THE  AM  cam  SEEN  BY  THE  MVME6000  CHIP. 

I  AM  CODE  TRANSFER  TYPE 


$19  EXTENDED  NONPRIVILEGED  DATA  ACCESS 

$1A  EXTENDED  NONPRIVILEGED  PROGRAM  ACCESS 

»  $1D  EXTENDED  SUPERVISORY  DATA  ACCESS 

$1E  EXTENDED  SUPERVISORY  PROGRAM  ACCESS 

/BGACK  IS  USED  BOTH  AS  THE  /BGACK  INPUT  TO  THE  020 

AND  AS  THE  /PBG  INPUT  TO  THE  6000.  /BR  IS  THE  /BR 
INPUT  TO  THE  020.  /DSACKO-1  ARE  THE  020  /DSACKO-1 
LINES.  /BG  IS  FROM  THE  020.  /PBR  IS  FROM  THE  6000. 


I - pnr  DeclAracloiia 


PIN 

1 

CLX 

1  INPUT 

PIN 

2 

AMO 

1  INPUT 

PIN 

3 

AMI 

1  INPUT 

PIN 

4 

AH2 

1  INPUT 

PIN 

5 

AM3 

1  INPUT 

PIN 

6 

AM4 

1  INPUT 

PIN 

7 

AMS 

1  INPUT 
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FIH 

8 

/AS 

t  INFUT 

FIM 

9 

/BO 

{  INFUT 

FIN 

10 

/B1 

1  IHPtrT 

FIN 

11 

/B2 

(  INFUT 

FIN 

12 

GND 

; 

FIN 

IS 

/B3 

t  INPUT 

FIN 

14 

/MATCH 

t  INPUT 

FIN 

IS 

/MATGCSR 

COMBINATORIAL  ;  OUTPUT 

FIN 

16 

/BGACK 

REGISTERED  ;  OUTPUT 

FIN 

17 

/BR 

B£GXSTEBED  I  OUTPUT 

FIN 

18 

/MATCH32 

COMBINATORIAL  t  OUTPUT 

FIN 

19  CLKIO 

REGISTERED  (  OUTPUT 

FIN 

20 

/FBR 

;  INFUT 

FIN 

21 

/DSACKl 

t  INPUT 

FIN 

22 

/DSACKO 

t  INPUT 

FIN 

2S 

/BG 

(  INPUT 

FIN 

24 

VCC 

I 

I - Boolean  Equation  Segment 

EC^TIONS 

HBTGCSR  •  AMS  *  /AM4  *  AH3  *  AM2  *  /AMI  *  AMO  *  MATCH 

MATCB32  -  /AMS  *  /AM4  *  AMS  *  /AM2  *  /AMI  *  AMO  *  BO 

+  /AMS  •  /AM4  *  AMS  *  /AM2  *  AMI  *  /AMO  *  B1 

/AMS  *  /AM4  *  AMS  •  AM2  *  /AMI  *  AMO  *  B2 

■f  /AMS  *  /AM4  *  AMS  *  AM2  *  AMI  *  /AMO  •  B3 

BR  -  PBR  *  /BGACK 

BGACK  -  PBR  *  BG  *  /AS  *  /DSACKO  *  /OSACKl 
BGACK  •  AS 
■f  BGACK  *  OSACKO 
*  BGACK  *  OSACKl 
+  BGACK  *  FBR 

CLKIO  -  /CIXIO 

I - Simulation  Segamnt - 

STMDLATIOH 

I - 


3. 1*3.5  InltlallMatlon  Coaaldsratlona 

It  Is  expected  that  the  need  will  exist  to  develop  e  wide  range  of 
application  code  for  the  VPH  in  the  future.  Since  the  board  is  not  supplied 
id.th  any  type  of  an  operating  system,  the  system  programmer  developing  code 
for  the  VPH  needs  to  be  aware  of  proper  resource  initialization  procedures  for 
various  VPH  resources.  Such  Initializations  are  necessary  at  power-up,  and 
possibly  at  any  other  time  that  the  VPH  Is  "reset"  or  reconfigured  as  required 
by  some  process.  The  following  section  discusses  these  considerations. 
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At  power-up  or  other  reset,  the  VPH’s  68020  will  begin  execution  at 
address  0  in  EPROM.  The  initialization  sequence  is  the  standard  seq\xence  as 
described  in  the  68020  User’s  Manual;  the  first  few  locations  in  EFRCRi  contain 
initial  stack  pointers,  the  execution  start  address,  etc. 

It  is  recommended  that  the  boot  sequence  for  the  020  load  the  SPC  and 
DPC  registers  with  $3,  as  this  is  the  function  code  used  for  accesses  to  VME 
via  the  MOVES  Instruction. 

If  the  PC  Interface  is  to  be  used,  the  control  registers  for  the 
interface  must  be  set  up  appropriately.  See  sections  on  the  PC  Interface  for 
more  information. 

When  the  VPH  wakes  up,  the  DUB  bit  at  020  address  $200002  will  be 
asserted.  This  causes  the  VPH  to  request  the  VME  bus  and,  once  granted,  will 
not  release  until  the  DUB  bit  is  negated.  This  should  be  done  early  in  the 
boot  sequence  so  as  not  to  Interfere  with  other  boards’  ability  to  coiiq>lete 
their  boot  sequences.  Negating  the  DUB  bit  may  be  accomplished  by  doing  a 
byte  read  of  location  $200002  in  VPH  local  memory  space. 

Proper  Initialization  of  Local  and  Global  Status  Registers  will  be 
required  before  the  VPH’s  VME  slave  and/or  master  will  function  properly. 
Information  on  the  MVME6000’s  LCSR  and  GCSR  may  be  found  elsewhere,  either  in 
this  document  or  in  the  MVME6000  User’s  Manual.  There  is  no  hard  and  fast 
rule  as  to  how  to  set  up  the  MVME6000|  the  necessary  initialization  will 
depend  upon  the  application  and  overall  system  configuration,  and  must  be 
determined  by  the  system  programmer. 

One  thing  that  will  need  to  be  done  in  nearly  any  situation  at  boot  is 
to  clear  the  BRDFAIL  bit  in  the  System  Controller  Configuration  Register  in 
the  LCSR.  If  this  is  not  done,  the  SYSFAIL  line  on  the  VMEbus  will  be 
asserted,  which  will  bring  the  system  to  its  knees  before  it  ever  gets  up  and 
running.  This  negation  may  be  accomplished  by  a  byte  write  of  $4  to  020 
address  $280001. 

It  is  good  practice  to  clear  the  Zoran  Interrupts,  reset  the  Zorans,  and 
clear  the  020’ s  status  bits  at  boot.  This  may  be  accomplished  by  writing  zero 
to  020  longword  location  $1C0000  and  $F  to  $1C0004. 

Also  necessary  at  boot  is  loading  configuration  data  to  a  couple  of 
locations  in  the  DSACK  SRAM.  These  locations  are  for  accesses  to  the  MVME6000 
and/or  VME  bus,  and  the  PC  Interface  (if  used).  The  following  code  segment 
will  accomplish  the  DSACK  SRAM  Initialization. 


MOVEA.L  #$20000000, AO 
MOVEA.L  #$60000000, A1 
MOVEA.L  #$240000, A2 
MOVEA.L  #$280000, A3 
MOVE.L  #0,(A0) 

MOVE.L  #$4,(A2) 
MOVE.L  #$1,(A3) 
MOVE.L  #0,(A1) 


i  DSACK  SRAM  ENABLE  ADDRESS 
; DSACK  SRAM  DISABLE  ADDRESS 
;FC  INTERFACE  BASE  ADDRESS 
;MVME6000  REGISTER  SET  BASE  ADDRESS 
i ENABLE  DSACK  SRAM 
lURITE  CONFIGURATION  NYBBLE  TO  SRAM 
;URITE  CONFIGURATION  NYBBLE  TO  SRAM 
iDISABLE  DSACK  SRAM 
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3.1.4  VFH  Inatallatlon  and  Satup  Proeadoraa 

The  following  procedure  describes  the  Installation  and  setup  of  the  VPH 
and  SBC.  It  shall  be  used  for  a  cold  start  sequence  (e.g.  the  unit  directly 
out  of  the  box) .  The  Instructions  are  also  useful  ^en  the  board  settings  of 
either  the  VFH  or  the  SBC  have  been  changed.  Before  any  of  the  following 
steps  are  taken«  you  should  read  and  study  the  VPH  User  Manual,  the  MVME6000 
manual,  and  the  SBC  Manual  for  the  135  board.  A  thorough  understanding  of  the 
address  spaces  of  each  board  will  be  necessary  If  hardware  or  software 
modifications  are  to  be  made.  This  will  help  prevent  Inadvertent  address 
space  overlap. 

MODE  1:  SBC  system  controller /VPH  non-system  controller 


1.  Set  the  VPH  switches  as  follows 


si 

1-8 

all 

off 

(address  map) 

82 

1-4 

off 

on 

on  off 

(AM  code  mods) 

83 

1-4 

on 

off 

of  off 

(default  DSACR  wait 

states,  used  In 
expansion  bus) 


2.  Set  VPH  Jumpers  as  follows 
JMPRl 

JMPR2  short  1  and  2 
JMPR3 

JMPR4 

JMPR5 


(set  for  EPRCM  size) 

(VPH  non- system  mode) 

(set  for  #  of  Zoran  ext 
memory  access  wait  states) 
(set  for  #  of  Zoran  ext 
memory  access  wait  states) 
(set  for  EPROM  size) 


3.  Set  the  SBC  switches  as  follows 

83  1-8  #4  on,  all  others  off 

84  1-10  4,  8,  9  on,  all  others  off 


You  are  now  configured  for  the  SBC  to  operate  as  system  controller. 
Plug  It  In  slot  #1  (left  most  slot  of  chassis).  135  Dbug  will  run  at  Its  base 
DRAM  address.  The  SBC  Is  configured  to  operate  with  32-blt  address  and  32-blt 
data. 


3.1.5  Ijrpleal  VFH  Oparstloa 

The  following  sections  describe  the  typical  execution  sequence  that  Is 
recommended  for  the  VPH  325  chips.  The  current  set  of  application  code  has 
adhered  to  these  procedures.  They  serve  to  provide  a  uniform  basis  for  future 
coding  practices  and  will  maintain  better  documentation  If  consistency  Is 
applied  to  the  programsilng  methodology. 

The  major  programming  convention  Is  necessary  to  ensure  that  the  four 
325  chips  Initiate  activity  simultaneously.  In  this  manner  the  code  executed 
by  each  chip  will  start  at  the  same  point  In  the  programs  and  end  at  the  same 
point  In  the  programs.  Zorans  describe  execution  across  multiple  chips  as 
waves.  Hence,  synchronization  of  the  wave  processing  Is  desired.  We  say  that 
a  chip  or  a  set  of  chips  conq>letes  a  wave  when  each  and  every  chip  has 
executed  Its  code  segment  relative  to  that  wave. 
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Synchronization  is  depicted  in  Figure  15.  Here,  a  starting  routine  is 
executed  first.  In  the  current  suite  of  code,  a  routine  called  STAKTUP.ASM  is 
used  for  most  of  the  applications .  It  is  a  generic  routine  for  any  of  Zoran’ s 
application  libraries  as  well.  Startup  initializes  the  status  bits  in  the 
status  latch  so  that  the  68020  or  020  can  synchronize  Zorans.  In  startup  the 
Zorans  do  not  modify  the  status  bits.  In  a  polling  loop,  the  020  will  modify 
these  bits  when  it  is  ready  to  initiate  Zoran  starts  simultaneously. 

Once  the  020  sets  the  status  bits  accordingly,  the  325s  begin  wave  1 
processing.  Wave  1  processing  consists  of  any  routines  a  user  wants  the  325s 
to  execute  such  as  convolution  or  FFT.  When  every  325  that  is  processing  has 
completed  their  tasks,  they  individually  set  their  status  latch  bits.  How  the 
020  has  been  monitoring  all  bits  in  a  poll  status  loop.  Upon  detecting  that 
each  and  every  325  has  completed  wave  1  tasks .  the  020  modifies  the  status 
bits  to  allow  the  325s  to  begin  wave  2  processing. 

Figure  15  depicts  only  two  waves,  but  the  concept  is  not  limited  to  only 
two  waves .  As  many  waves  or  routines  as  are  desired  isay  be  used  in  this 
method.  Further,  the  waves  may  be  any  routines  desired  by  the  user.  They  do 
not  have  to  be  the  same  code. 

Another  important  programming  convention  is  the  consistent  usage  of  the 
stack  frames  as  depicted  in  Figure  16.  The  example  discussed  assumes  that  two 
325s  are  sharing  the  same  bus,  probably  325s  1  and  2  using  PRAM  1.  The 
convention  should  be  followed  no  matter  how  many  325s  are  used  or  how  many  325 
buses.  The  two  key  325  registers  are  the  stack  pointer  (SP)  and  the  program 
counter  (PC)  of  each  325.  To  synchronize  execution  across  multiple  325s,  it 
will  be  necessary  to  start  them  with  correct  program  starting  addresses. 
Those  are  popped  off  the  stacks.  A  stack  frame  will  then  consist  of  addresses 
for  important  locations  like  the  program  starting  addresses,  locations  of 
parameters  to  pass  into  and  out  of  the  routine  or  subroutine. 

Those  addresses  are  found  in  the  MAP  file  of  the  code  relevant  to  the 
current  application.  They  are  generated  by  the  Zoran  325  assembler  process. 
Each  address  must  be  linked  into  the  program,  so  a  specific  procedure  is 
followed.  The  Zoran  Assembler  Manual  explains  the  method.  The  current 
application  library  has  adhered  to  this  procedure  in  every  program. 

The  typical  execution  begins  with  each  325  with  the  correct  PC  and  SP 
value  in  them.  Note  that  the  SP  points  to  the  first  location  below  the 
starting  location.  Upon  initiation  of  execution,  the  SP  is  incremented  first 
and  then  the  value  is  popped  off  the  stack.  The  4-port  serves  as  the  data 
space  for  each  325  which  the  stack  pointers  1  and  2  (or  as  many  as  you  need) 
point  to.  The  PRAM  contains  the  actual  routine  used  in  the  current 
application.  The  code  should  always  start  at  location  0000  as  this  makes 
assembly  easier.  Also,  keep  sufficient  space  between  each  stack  pointer  in 
the  PRAM  so  that  the  325s  do  not  inadvertently  write  into  your  stack  (as  might 
occur  with  an  interrupt ) . 
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READY 

l: 

1  WAV 

SET  STATUS 


READY 


flgura  15.  STnehroBlsatlan 


i 

I 
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PARAMETER  PASSAGE  CONVENTIONS 


flgura  16.  Paraaatar  ?••••••  to  loatlaaa  VIA  Staeka 


A  typical  stack  frame  Is  shown  on  the  left  of  Figure  16.  Two  routines 
are  assumedt  each  with  a  synchronization  call  and  list  of  parameters.  The 
last  routine  will  also  execute  a  STC  provided  FINISH  routine.  FINISH  cleans 
up  the  status  latch  bits  to  Indicate  to  the  020  that  the  wave(s)  by  all  325s 
have  been  executed.  The  020  then  uploads  the  results  into  the  correct  4-port 
space.  This  activity  Is  shown  In  Figure  17.  Again  for  consistency,  all 
current  programs  follow  this  activity  flow.  Near  the  bottom  of  the  chart  Is  a 
decision  box.  If  more  routines  are  to  be  executed,  the  resultant  path  depends 
on  the  routines  Invoked.  Typically,  the  path  continues  up  to  set  325  mode 
bits. 

3. 1.5.1  Syetam  Bootup 

To  bring  up  the  VPH  system  with  the  68020  monitor  program.  Just  turn  on 
the  power.  If  the  VPH  stops  responding  for  some  reason.  It  can  be  reset  with 
the  reset  switch  found  on  the  board  Itself. 

3. 1.5. 2  Inltlallaatlott 

If  the  lo  monitor  program  with  a  PC  Is  being  used.  It  Is  liiq>ortant  to 
set  up  Its  status  register  manually.  Then  the  Zoran  Interrupts  must  be 
cleared  and  the  Zorans  must  be  reset  again.  The  steps  for  this  are  as 
follows.  Keystrokes  are  shown  In  square  braces. 

1.  set  the  port  to  the  status  register  [P  362  <CR>] 

2.  clear  the  Interface  by  writing  ones  [W  FFFF  <CR>] 

3.  set  up  the  correct  status  values  (W  I86C  <CR>] 

4.  set  the  port  back  to  the  FIFOs  [P  360  <CIt>] 

5.  clear  the  Interrupts  with  a  poke  of  0  to  address  ICOOO 
tW  12  <CR>  W  0  <CR>  W  IC  <CR>  W  0  <CR>  W  0  <CK>] 

6.  reset  the  Zorans  with  a  poke  of  F  to  address  1C0004 
[W  12  <CR>  W  4  <CR>  W  1C  <CK>  W  F  <CR>  W  0  <CR>] 

If  a  script  Is  being  used,  all  of  these  operations  can  be  conveniently 
performed  by  a  single  call  to  the  Inlt()  function. 

3. 1.5. 3  Transfer  Prograas  to  Zoran  Prograa  lAlf  (FBAM) 

If  the  lo  monitor  program  is  being  used,  programs  can  be  downloaded  with 
the  Download  command.  As  an  example,  assume  that  the  file  fft2d32.6  is  being 
downloaded  to  FRAMl  and  FRAM2,  \^lch  start  at  addresses  100000  and  180000. 
The  command  sequence  would  be 

[D  fft2d?2.»)  <CR>  100000  <CR>  D  fft2d32.s  <CR>  180000] 

If  a  script  Is  being  used,  programs  can  be  downloaded  with  a  call  to  the 
Download  function.  For  the  example,  the  call  would  be 

Download("fft2d32.s",  0x100000); 

Download("fft2d32.s”,  0x180000); 


69 


flSim  17  •  Typical  VFE  Activity  Tic*  Chart 
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Macro  definitions  can  be  used  to  simplify  this  to 

#deflne  FRAMl  0x100000 
#deflne  PRAM2  0x180000 
Download("fft2d32.s",  PRAMl); 

Dovnload("fft2d32.s",  PRAM2)j 

3. 1.5*4  Data  Traaafar  to/from  Four  Port  Ifaaozy 

If  the  lo  monitor  program  is  being  used,  data  files  can  be  downloaded 
with  the  Download  command  as  well.  These  files  will  generally  be  ASCII 
hexadecimal  files.  If  the  Zoran  or  Motorola  assemblers  are  used  to  create 
data  files  to  go  into  the  four  port  memoi^,  an  address  offset  of  zero  is  used 
Instead  of  the  values  here.  This  Is  because  the  S  format  files  already 
contain  the  correct  addresses  for  each  record.  This  was  not  the  case  for  the 
program  files  being  transferred  to  PRAM  because  address  zero  In  the  PRAM 
appears  at  100000  or  180000  In  the  68020  address  space.  Here  Is  an  example  of 
downloading  a  data  file  to  the  four  port,  \diich  starts  at  address  80000. 

[D  fft2d32.dat  <CR>  80000  <CR>1 

With  a  script,  this  would  be  performed  by  a  call  to  the  Download 
function  as  follows. 

Ideflne  F0nR_F0RT  0x80000 
Download ( " f f t2d32 . dat" ,  F0UR_P0RT) ; 

For  uploading  results,  the  Upload  command  is  used.  This  command 
requires  a  size  In  longwords  and  produces  an  ASCII  hexadecimal  file  as  output. 
From  the  monitor,  the  command  to  upload  the  2048  (800  hexadecimal)  longwords 
of  results  of  the  fft2d32  program  from  four  port  would  be  as  follows. 

[U  80000  <CR>  800  <CR>  fft2d32.out  <CR>] 

With  a  script,  this  would  be  performed  by  a  call  to  the  Upload  function 
as  follows. 

Upload(F0UR_P0RT,  2048,  "fft2d32.out") ; 

3. 1.5. 5  Sattlag  thm  Zoran  lagletore 

The  Zoran  Internal  registers  can  be  accessed  from  the  68020.  Each  Zoran 
Is  mapped  Into  a  different  set  of  memory  locations.  These  are  dociunented  In 
the  hardware  memory  map,  but  will  be  repeated  here  for  convenience.  Zoran  1 
Is  at  COOOO,  Zoran  2  Is  at  CIOOO,  Zoran  3  Is  at  140000,  and  Zoran  4  Is  at 
141000.  The  register  offsets  from  these  starting  addresses  are  listed  In  the 
Zoran  Engineering  Data  Manual.  These  offsets  must  be  shifted  left  two  bits  to 
convert  them  from  addresses  of  longwords  to  addresses  of  bytes.  Some  of  the 
more  Important  resulting  offsets  are  the  stack  pointer  at  414,  the  program 
counter  at  404,  and  the  mode  register  at  408.  A  specific  Zoran  register  can 
be  accessed  by  adding  the  offset  to  the  starting  address.  For  example,  the 
Zoran  2  stack  pointer  Is  at  address  C1414.  To  write  the  value  33  to  that 
stack  pointer  from  the  monitor  would  require  the  following  commands- 
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[W  12  <CR>  W  1A14  <CR>  W  C  <CR>  W  33  <CK>  W  0  <CR>] 

To  perform  the  same  operation  from  a  script  mould  require  a  call  to  the 
Poke  function  with  appropriate  parameters. 

#deflne  Z0RAN2  OxclOOO 
#deflne  SPjOFFSET  0x414 
Poke(Z0RAN2  +  SPjOFFSET,  0x33); 

Similar  methods  are  used  to  write  to  the  other  registers.  Writing  to 
the  PC  causes  the  Zoran  to  hegln  executing  at  the  address  written.  The  mode 
register  has  many  bits  which  should  not  he  altered.  The  Initial  state  Is 
acceptable.  If  speed  of  execution  Is  Important,  the  number  of  wait  states  for 
memory  access  can  be  reduced  from  one  to  zero  by  writing  the  appropriate 
value.  This  Is  performed  from  a  script  as  follows. 

#deflne  MODEjOFFSET  0x408 
Foke(Z0RAN2  +  MODEjOFFSET,  0x70f2Sl); 

3. 1.5. 6  Aeeeeelng  the  Status  Latch 

The  68020  can  modify  Its  status  latch  values  by  writing  to  address 
ICOOOO.  The  status  latch  bits  are  the  bottom  two.  The  68020  can  Interrupt 
the  Zorans  by  setting  higher  bits  In  the  same  location,  so  only  the  bottom  two 
bits  should  be  set  \dien  modifying  the  status  latch.  Commands  from  the  monitor 
to  set  the  upper  status  bit  (status  value  2)  would  be  as  follows. 

(W  12  <CR>  W  0  <C8>  W  1C  <CK>.  W  2  <CR>  W  0) 

From  a  script  file,  the  same  operation  would  be  performed  with  a  call  to 
the  Poke  function. 

Ideflne  STATUS_LATCH  OxlcOOOO 
Poke(STATUS_LATCH,  0x2); 

The  68020  can  read  back  the  status  latch,  but  It  will  not  contain  the 
value  that  was  written.  Instead  It  will  contain  the  values  written  by  the 
Zorans  In  the  bottom  byte.  To  read  It  from  the  monitor  would  require  the 
following  commands. 

[W  11  <CR>  W  0  <CR>  W  1C  <CR>  R  R] 

To  read  It  from  a  script  program  and  assign  Its  value  to  a  variable 
would  require  a  call  to  the  Peek  function. 

long  value; 

value  -  Peek(STATUS_LATCH); 

All  processors  write  to  the  bottom  two  bits  of  the  status  register. 
When  they  read  from  the  status  register,  they  see  the  values  written  by  the 
other  processors.  The  $8020  ••••  tha  ▼•Inac  la  the  ordar  ZoranA  bltSf  ZoraaS 
bltCf  ZoraaZ  blta*  Zoraal  bltCf  Uatad  froa  aoat  alfalflcaat  to  laaat 
algal f leant.  Each  Zoran  sees  the  values  In  an  order  that  Is  symmetrical  with 
respect  to  Itself  and  the  bus  It  Is  on.  Most  Importantly,  the  68020  bits  are 
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seen  aC  the  same  place  by  each  Zoran.  This  allows  more  convenient  coding  for 
communication.  The  order  Is  opposite  bus  high  ZoraUf  opposite  bus  low  Zorant 
same  bus  other  and  Zoran,  68020  bits. 

3.1.6  VPl  Scripts 

The  VPH  Is  delivered  with  a  set  of  applications  programs  found  In  the 
appendices .  Some  of  these  programs  have  been  collected  Into  a  type  of  "main" 
program  called  "scripts".  A  script  Is  an  organized  collection  of  routines  and 
subroutines  that  eliminate  many  of  the  keystrokes  needed  when  a  Command 
processor  like  the  lo  monitor  used  by  STC  to  demonstrate  the  VPH  Is  Invoked. 
A  script  assend>les  all  of  the  necessary  conanands  Into  a  single  command  entry 
\dilch  Is  typically  the  filename  of  the  application  Itself.  For  Instance,  If 
an  FFT  program  were  to  be  executed,  several  conmands  to  the  command  processor 
are  necessary.  They  are  the  data  space  setup  commands,  the  status  latch  setup 
commands  for  the  68020  and  the  325s,  download  commands  and  upload  commands  for 
the  results.  Six  scripts  have  been  provided  with  the  VPH,  Including  2DFFTs 
for  8x8,  16x16,  32x32,  a  Ik  FFT,  real  and  complex  convolution  and  correlation, 
and  coordinate  conversion  routines. 

3.2  CFH  fimetlonal  Unite 

From  a  programmer’s  perspective  (Figure  18),  the  CPH  consists  of  two 
multipliers  and  two  ALUs  co^ected  to  cache  and  auxiliary  memory  via  a 
crossbar  switch.  It  Is  Important  to  note  that  the  crossbar  switch  Is  fully 
programmable  In  one  clock  cycle.  Also,  It  Is  a  fully  parallel  gateway.  All 
selected  paths  are  available  In  one  clock  cycle.  Furthermore,  the  crossbar 
has  an  Internal  register  file  ^Ich  Is  available  to  any  other  resource. 

The  address  generation  Is  performed  by  a  separate  board  called  the 
address  generator  board.  Details  of  this  board  are  described  elsewhere.  The 
address  generator  board  contains  a  set  of  crossbars  also.  Microprogramming 
the  CPH  consists  of  using  the  78A-blt  microword  depicted  In  the  appendix.  All 
fields  are  slmultaueously  available.  Hence,  the  CPH  Is  a  true  Very  Long 
Instruction  Word  machine  (VLIU) .  Because  the  multipliers  are  faster  than  the 
memory  chips,  one  stage  of  pipelining  Is  added  to  all  data  paths  and  Is  shown 
In  the  figure.  Mlcrowords  are  emitted  as  two  phases  of  768/2  or  384-blts. 
The  machine  definition  file  In  the  appendix  for  the  CPH  shows  which  fields  are 
active  In  each  phrse.  When  a  field  Is  active  In  both  phases,  the  ASSIGN 
statement  Is  repeat  ed  for  those  fields  except  that  the  physical  bits  differ 
per  phase. 
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3*2.1  ProcMaor 


The  processor  board  is  the  ntimerical  engine  of  the  CPH  architecture. 
Each  board  contains  2  BIT  2110  ALU  devices  and  2  BIT  2120  Multiplier  devices. 
They  are  connected  to  other  resources  via  nine  xbar  devices.  Hine  are  used  so 
that  parity  can  be  generated.  Otherwise  the  32-bit  space  would  only  require  8 
xbars  on  the  processor  board.  The  organization  is  shown  in  Figure  18.  It  is 
useful  as  a  progranmer's  model  because  it  details  the  port  assignments  for 
each  xbar  and  the  microinstruction  fields  relevant  to  each  port. 

From  this  figure  we  see  that  the  architecture  is  a  two  phase  pipelined 
organization.  All  resources  have  the  capability  to  pipe  two  levels  of  data. 
This  was  done  so  that  the  slower  memory  devices  can  conceptually  keep  up  with 
the  faster  2120s  on  the  processor  board.  It  is  Important  to  note  that  the 
ALUs  do  not  have  on-chip  registers.  So  an  external  register  file  is  provided 
which  is  end>edded  in  the  XBAR  chips  as  a  64  word  file  arranged  in  an  8x8 
array.  The  register  file  is  general  enough  to  allow  FIFO,  shift  left  and 
right  operations  to  them.  These  are  called  register  mode  operations  fully 
described  in  the  xbar  section  of  this  report. 

The  processor  board  contains  a  writable  control  store  for  the  control 
points  on  the  board.  Twelve  microprogram  memory  modules  are  used.  They  are 
partitioned  into  real  and  imaginary  fields  and  are  signified  by  "MEM72”  labels 
on  the  schematic.  The  UCS  instructions  are  chosen  so  that  complex  arithmetic 
operations  are  facilitated  by  their  respective  real  and  imaginary  parts.  The 
UCS  is  downloaded  from  the  lOP  board.  A  UCS  allows  dynankic  microprogramming 
ao  that  multiple  mlcroroutlnes  can  be  executed  without  excessive  host 
interaction.  The  modules  have  been  designed,  fabricated  and  tested.  A  spare 
module  also  is  being  supplied.  These  modules  are  also  identical  to  the  UCS 
modules  in  the  address  generator  board  where  the  EVA  master  control  store 
resides.  The  UCS  essentially  supports  reconfigurability  of  the  ALUS  and 
multipliers  by  microprogram  control.  Some  of  the  options  are  depicted  in 
Figure  19.  Those  shown  often  are  useful  for  inner  and  outer  product 
operations  on  matrices. 

The  current  status  of  the  processor  board  design  will  require  adding 
error  FIFO  flags  (only  if  arithmetic  status  conditions  are  needed)  and  ECL 
clock  distribution  circuitry  to  the  board.  All  other  data  and  control  paths 
have  been  assigned  and  entered  Into  the  schematic.  Should  a  slower  clock  be 
used,  ECL  logic  can  then  be  replaced  with  CMOS  clock  distribution  nets.  The 
design  will  become  much  simpler  in  the  process.  Also,  the  high  speed  10  or 
HSIO  control  circuitry  needs  to  be  added  to  the  schematic. 

The  original  Phase  I  design  for  this  board  relied  on  the  availability  of 
and  around  carries  being  generated  by  the  ALUs  and  multipliers.  End  around 
carries  are  necessary  for  two’s  complement  arithmetic.  However,  when  the 
final  data  specifications  were  completed  by  BIT,  this  signal  was  not  provided. 
Hence,  cascading  these  32-blt  chips  via  32-blt  boards  became  impossible.  The 
current  design  then  doubled  the  number  of  engines  per  board  so  that  each  board 
could  behave  as  a  32-,  64-  or  128-bit  board  under  microprogram  control.  In 
this  way,  reasonable  emulation  speeds  could  be  maintained  and  across-a-bus 
delays  are  eliminated. 
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The  current  processor  board  design  connects  the  xbars  to  the  cache 
memory  via  the  CPH  backplane  as  sho«n  In  the  CPH  Physical  Layout  In  Figure  20. 
This  figure  Is  Important  when  marl mum  execution  speed  Is  desirable  In  the 
microprograms.  The  slowest  path  will  always  be  the  one  \^lch  takes  the  data 
off  the  board.  Hence,  when  writing  new  microcode,  the  user  should  realize  as 
shown  In  the  figure  that  the  cache  accesses  will  take  place  across  the  CPH 
backplane.  The  same  Is  true  for  the  10  path  obviously. 

3*2.2  Ceehm  Heninry 

The  cache  memory  board  Is  a  versatile  module  for  the  CPH.  It  Is 
designed  to  be  cascaded  so  that  memory  space  Is  limited  only  by  the  physical 
dimensions  of  the  mainframe  space.  This  cache  can  also  be  viewed  as  the  main 
memory  space  of  the  EVA.  It  uses  cache  memory  modules  which  have  been 
designed,  fabricated  and  fully  tested.  The  board  Itself  which  houses  the 
separate  modules  has  not  been  fabricated.  Each  module  Is  a  SIMM  or  strip  of 
discrete  memory  chips  mounted  on  a  small  circuit  board  as  shown  In  Figure  21. 
Fabricating  the  Sllflls  this  way  allowed  us  to  design  very  dense  cache  memory 
boards . 

The  Individual  memory  cells  of  the  modules  uses  a  3-port  cell  scheme  as 
depicted  In  Figure  22.  Here,  we  see  that  data  ports  A  and  B  are  output  ports, 
while  data  port  C  Is  an  Input  port.  This  Is  Important  to  remember  when 
microcoding  the  CPH  because  certain  ports  are  only  read  and  others  are  only 
write  ports.  The  fields  In  the  microinstruction  reflect  these  conventions 
also.  Mote  that  the  clock  timing  Is  a  4-phase  clock  with  two  phase  180 
degrees  out  of  phase  and  the  other  two  clocks  In  quadrature  with  these  two 
phases.  A  4-phase  clock  scheme  was  chosen  to  maximize  throughput  of  the 
modules.  The  cache  memory  bus  timing  also  follows  In  Figure  23.  Bus  timing 
evaluation  Is  necessary  to  complete  the  backplane  clock  distribution  design. 

The  cache  memory  board  Is  currently  In  design  and  Its  schematic  Is 
nearly  75Z  conpleted.  Its  RAM  timing  has  been  fully  specified  by  Figure  24. 
Here,  It  Is  Important  to  note  that  the  4-phase  clock  Is  still  needed  on  the 
board  Itself.  Also,  when  future  microcoding  starts,  the  code  should  observe 
the  timing  delays  to  be  encountered  by  the  clocks.  For  example,  the  last  line 
shows  that  the  "A  DATA  OUT"  signal  will  generate  the  most  significant  data 
word  first  followed  by  the  least  significant  data  word.  Mben  microcoding  the 
cache  accesses,  the  coder  should  realize  this  multiplexing  of  the  MS  and  LS 
words. 


The  cache  memory  board  can  be  configured  as  follows: 

Memory  block  -  16k  X  36  (or  64j  X  36)  unit  of  memory.  A  Juaq>er  should 
reside  on  the  board  to  set  the  size  of  each  of  the  two  blocks  resident  on  the 
board.  Pinouts  of  Cache  Memory  Modules  are  Identical  for  both  possible  sizes 
-  the  only  difference  la  that  the  two  MSBs  of  the  address  are  not  used  on  the 
16k  modules. 

Memory  bank  -  a  2S6K  deep  region  of  memory.  There  may  be  a  maximum  of 
16  baidca  each  of  Cache  and  Auxiliary  memory. 
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Both  blocks  of  memory  on  each  cache  board  must  be  configured  as  either 
cache  or  Auxiliary. 

Address  ports  - 

Port  A  -  Cache  Complex  Read  Port 
Port  B  -  Cache  Complex  Read  Port 
Port  C  -  Cache  Real  Write  Port 
Port  D  -  Cache  Imaginary  Write  Port 
Port  E  -  Auxiliary  Memory  Port 
Port  F  -  H.S.I.O.  Port 

Data  ports  - 

Port  A  -  Cache  Real  Read  Port  A  -  Cache  Imaginary  Read 

Port  B  -  Cache  Real  Read  Port  C  -  Cache  Imaginary  Read 

Port  C  -  Cache  Real  Write 

Port  D  -  Cache  Imaginary  Write 

Port  E  -  Aux.  Real  Read  Port  E  -  Aux.  Imaginary  Read 

Port  P  -  flSIO  Real  (R/W)  Port  F  -  HSIO  Imaginary  (R/W) 

Address  port  pairs  A  &  C  and  B  &  D  are  time-multiplexed  (they  are 
physically  the  same  backplane  pins).  During  clock  phase  0»  ports  C  and  D  are 
active;  during  clock  phase  1,  ports  A  and  B  are  active. 

In  addition,  time  multiplexing  exists  on  the  E  address  port.  During 
phase  0,  address  port  E  carries  bank  addresses.  Bits  0-3  are  the  cache  bank 
address  and  bits  4-7  are  the  aux.  bank  address.  During  phase  1,  address  port 
E  carries  an  aux.  memory  address. 

The  8-blt  configuration  address,  which  Is  used  to  address  each  cache 
board  uniquely  during  the  system  configuration  process,  may  appear  on  either 
address  port  A,  B,  C,  or  D.  This  Is  your  choice.  The  configuration  address 
of  each  board  Is  set  for  each  board  on  a  dlpswltch. 

In  addition  to  bank  address  and  selecting  either  cache  or  aux.  memory, 
configuration  data  must  Include  whether  a  block  of  memory  is  the  most  or  least 
significant  word.  Also,  the  offset  into  the  bank  will  be  required  for  each 
block. 

Separate  decoding  circuitry  will  be  required  for  cache.  Auxiliary,  and 
HSIO  addresses.  Because  the  limitation  exists  that  a  given  bank  of  memory 
may  not  be  accessed  "y  the  processor  and  the  lOP  at  the  same  time.  If  a  valid 
cache  or  aux.  bank  address  Is  presented  to  the  board,  the  processor  addresses 
are  captured  by  the  first  level  of  decode  circuitry,  regardless  of  \4iether  a 
valid  HSIO  address  Is  present  or  not. 

There  are  4  bits  of  microcode  resident  on  each  board  for  each  of  the 
two  clock  phases.  These  bits  are  active  /WRCAr,  /WRCAl,  /WRADXr,  and  /WRAUXl 
during  phase  0,  and  /RDA,  /RDB,  /RDEr,  and  /RDEl  during  phase  1. 
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3.2.3  Addraa*  Ganarator  (3fi) 


A  considerable  effort  was  expended  to  enhance  many  of  the  address 
generator’s  circuits.  High  speed  ALU  and  memory  chips  finally  arrived  by 
March  1990,  but  development  of  the  required  "glue  logic"  chips  lagged  behind. 
The  address  generator  requires  16-blt  vide  counters  and  adders  capable  of  a  40 
MHz  clock  rate.  These  parts  were  unavailable  in  1990.  Many  times,  PALs  could 
have  been  used  to  Implement  functions  not  available  as  standard  devices.  New 
larger  and  higher  speed  PAL  type  of  devices  have  only  recently  been  developed. 
Unfortunately,  they  are  still  too  slow.  The  smaller  PAL  devices  are  capable 
of  high  speed,  however,  it  is  necessary  to  cascade  multiple  devices  together. 
The  combined  delay  was  too  great.  The  devices  large  enough  to  fit  these 
functions  on  a  single  chip  were  too  slow. 

Several  companies  had  large  high  speed  PAL  devices  under  development 
during  1990.  Cypress,  AMD,  Plus  Logic,  and  Altera  released  new  devices  that 
year.  Some  of  these  new  parts  are  now  fast  enough  to  solve  many  of  the  speed 
problems.  Also,  Integrated  Device  Technology  plans  to  make  available  many 
standard  logic  functions  in  a  new  high  speed  BiCMOS  technology. 

The  address  generator  is  designed  to  support  multiple  matrix  addressing 
tasks  directly  In  hardware.  The  purpose  of  the  AG  board  is  to  reduce  the 
overhead  normally  Incurred  by  computing  complex  addresses  In  software.  To 
keep  the  overhead  down,  4  2-D  counters  are  available  on  the  board  to  assist 
memory  access  in  a  matrix.  A  dataword  can  be  accessed  randomly.  In  a  row, 
down  a  column,  down  a  diagonal,  down  a  subdiagonal  and  all  of  the  above  In  the 
opposite  direction.  The  2-D  counter  circuits  are  depicted  In  Figure  25. 

The  2-D  counters  are  designed  with  ZDT7381L20  high  speed  adders.  These 
adders  were  to  be  found  In  a  Plus  Logic  2040  FPGA  but  the  2040  did  not  become 
available  during  this  Phase  II  effort.  The  IDT7217L25  multipliers  are  used 
for  address  offset  computations  executed  directly  In  hardware.  This  hardware 
address  generation  method  reduces  the  overhead  of  complex  address  generation 
to  a  minimum.  Although  the  AMD  29540  Is  shown  In  the  figure,  the  device  has 
since  been  deleted  from  AMD  Inventory  with  no  second  sourcing.  Should  future 
availability  occur,  then  these  devices  should  be  Incorporated  In  the  position 
shown  In  this  figure.  A  discrete  logic  implementation  of  this  device  was 
executed.  Over  40  16-pln  devices  are  needed.  Bence,  the  FFT  hardware  address 
generation  feature  of  the  CPB  had  to  be  deleted. 

It  Is  done  by  preloading  the  coiinters  with  the  appropriate  starting 
address  and  counting  up  or  down  as  required.  Control  Is  accomplished  with 
fields  In  the  microinstruction  such  as  2-D  counters  #1,  #2,  #3,  and  #4.  The 
microorders  are  fully  parallel  across  the  4  counters.  As  a  result,  4 
concurrent  addresses  can  be  generated  and  sent  anywhere  In  the  CPH  by  virtue 
of  the  crossbar  switch.  The  block  diagram  of  the  AG  board  follows  In  Figure 
26.  The  AG  board  houses  the  microprogram  control  unit  for  the  CPH.  Here,  one 
finds  the  microsequencer  control  for  program  control .  Another  microprogram 
memory  resides  on  the  processor  board  but  this  Is  slaply  writable  control 
store.  Once  a  program  Is  downloaded  to  the  processor  board,  execution  of 
microinstructions  on  that  board  follows  sequentially. 
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3.2.3. 1  CFB  Addrsaa  Ganarator  Board  Doanload 

Recall  that  the  address  generator  board  houses  the  central  control  store 
of  the  EVA.  To  download  EVA  microprograms  from  the  I/O  Processor  (lOP)  to  the 
Address  Generator  Board  (AG)  the  code  running  on  the  CPH  system  must  request  a 
program  download  by  pulling  the  Download  Request  (DLRQST)  line  low  on  the 
high-speed  I/O  bus  (HSIOB) .  This  not  only  requests  the  lOP  to  download  the 
programi  It  also  causes  the  microsequencer  to  push  the  program  counter  onto 
the  stack  and  to  halt.  The  status  of  all  counters .  RAM  (other  than  program 
RAM) (  and  other  circuits  are  preserved  at  that  moment .  The  lOP  then  downloads 
the  program  to  the  program  RAM  as  follows: 

The  lOP  places  the  Program  RAM  address  onto  the  HSIOB  I/O  address  lines. 
Each  board  In  the  system  decodes  the  address  and  the  targeted  board  latches 
the  address. 

3.2.4  1/0  Frocmeeor  Purpoee  and  Vaaturaa 

The  lOP  Processor  (lOP)  serves  as  the  comnunlcatlon  link  between  the  CPH 
system  via  the  High-Speed  I/O  (HSIO)  bus.  an  IBM-PC  via  the  I/O  (PCIO)  port, 
and  the  VME  VPH  processor  via  the  Serial  I/O  (SIO)  port.  The  SIO  port 
communicates  directly  to  a  buffer /communications  board  residing  In  a  VME 
chassis,  so  optionally  this  port  can  serve  as  the  host  rather  than  an  IBM-PC 
If  desired. 

The  microcontroller  on-board  the  lOP  Is  entirely  Interrupt  driven.  In 
response  to  an  Interrupt  received  from  one  of  the  I/O  Interfaces,  It  executes 
the  Interrupt  service  routine  pointed  to  by  Its  Internal  Interrupt  vector 
table.  In  the  case  of  an  Interrupt  from  the  host,  this  routine  simply  reads  a 
connand  from  the  Interface  and  executes  It.  This  will  generally  be  a  comoand 
to  transfer  a  block  of  data  from/to  the  host.  This  Is  done  by  Initializing 
one  of  two  data  transfer  counters,  initializing  the  appropriate  Interface 
control  registers,  and  then  setting  the  GO  control  bit  on  the  "sending" 
Interface's  control  register.  The  control  logic  for  each  interface  handles 
the  necessary  handshaking  to  cosq>lete  the  data  transfer.  Including  monitoring 
flags  and  generating  read  and  write  signals,  all  Independent  of 
microcontroller  Intervention.  Upon  completion  of  the  transfer,  the  "sending" 
interface  generates  an  Interrupt,  and  the  microcontroller  performs  the 
necessary  resource  allocation  cleanup. 

3. 2. 4.1  lOP  Control  Blginnle 

Addressing  the  control  registers  Is  accomplished  by  setting  the 
microcode  control  address  field  to  the  address  indicated  below  In  each 
register  description.  Bits  In  the  microcode  data  field  may  be  either  data 
write  enable  bits  or  data  bits,  as  defined  In  each  control  register 
description.  In  order  to  modify  a  bit  in  a  control  register,  the  control  bit 
associated  with  the  data  bit  must  be  set  LOW,  the  data  blt(s)  must  be  set  to 
the  desired  value,  and  the  correct  address  must  be  present.  When  all  this 
occurs  along  with  the  Control  Register  Write  (CRW)  microcode  bit  set  low,  the 
change  will  occur. 
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RESOURCE  ALLOCATION  ADDRESS  0 

PAL  FILES:  CTRL8.PDS 

PAL  DEVICE:  PALCE26V12 

This  register  Indicates  \diat  resources  are  currently  In  use  and  ^Icb 
are  available.  These  bits  are  undefined  at  power-up  or  after  a  reset  and  must 
therefore  be  Initialized  prior  to  operation.  The  resources  are: 


MICROCODE 

BITS 

control 

data 

19 

7 

Counter  A 

18 

6 

Counter  B 

17 

5 

High-Speed  I/O  Interface  Receive 

16 

A 

High-Speed  I/O  Interface  Send 

15 

3 

Serial  I/O  Interface  Receive 

lA 

2 

Serial  I/O  Interface  Send 

13 

1 

IBM-PC  Interface  Receive 

12 

0 

IBM-PC  Interface  Send 

Each  software  routine  which  uses  a  resource  first  checks  Its 
availability.  Once  the  routine  has  determined  that  the  resource  Is  available 
by  detecting  a  HIGH  In  the  appropriate  bit.  It  sets  that  bit  LOW  to  Indicate 
that  It  Is  In  use.  All  Interrupts  must  be  disabled  during  this  portion  of  the 
code.  The  bits  are  read  using  the  microsequencer  flag  (condition)  Input. 

IBM-PC  INTERFACE  CONTROL  ADDRESS  1 

PAL  FILES: 

PAL  DEVICE: 

This  register  contains  all  IBM-PC  receiver  Interface  controls,  controls 
which  are  common  to  both  the  IBM-PC  transmit  and  receiver  Interfaces,  and 
controls  that  are  Initialized  during  reset  and  normally  remain  unchanged 
afterwards.  Upon  reset  all  outputs  are  set  HIGH. 

MICROCODE  BITS 


control 

data 

17 

8 

RECEIVE  -  Allows  sending  Interface  to 
send. 

16 

7 

SOURCE  -  Selects  the  source  Interface 
when  receiving  data  -  LOW 

Is  SIO,  HIGH  Is  HSIO 

15 

6 

CLRINT  -  Interrupts  cleared  \dien  LOW 

lA 

5 

ENINT  -  Interrupts  enabled  when  HIGH 

13 

A 

ODD /EVEN  -  Parity  ODD  when  LOW 

13 

3,2,1 

CLK  2,CLK  1,CLK  0 

CLK  2  CLK  1  CLK  0 


0 

0 

0 

500 

KHz 

0 

0 

1 

1 

MHz 

0 

1 

0 

2 

tSz 

0 

1 

1 

A 

MHz 

1 

0 

0 

8 

MHz 

1 

0 

1 

16 

MHz 

1 

1 

0 

16 

MHz 

1 

1 

1 

500 

KHz 
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12 


0 


RESET  -  Reset  the  IBM-PC  interface 
ii^en  LOW 


IBM-PC  INTERFACE  TRANSMIT  CONTROL  ADDRESS  2 

PAL  FILES t 
PAL  DEVICE: 

This  register  controls  the  operation  of  the  IBM-PC  transsiit  interface. 
Upon  power-up  reset  or  IBM-PC  interface  reset  all  bits  are  set  HIGH. 


MICROCODE 

BITS 

control 

data 

18 

7 

PMRSTATl  -  STATl  receive  interrupt 
mask 

17 

6 

PMRSTATO  -  STATO  receive  interrupt 
mask 

16 

5 

GO  -  Enables  sending  data  vdien  LOW 

15 

4 

PSELAB  -  Selects  which  counter  is 
assigned  to  the  IBM-PC  Interface  for 
sending  data  -  LOW  is  counter  A, 

HIGH  is  counter  B 

14 

3.2 

REAL,  IMAG 

REAL  IMAG 

0  0  64-blt,  low  word  first 

0  1  32 -bit,  imaglnairy  data 

1  0  32 -bit,  real  data 

1  1  64-blt.  high  word  first 

13 

1 

XSTATl  -  Transmit  status  bit  1 

12 

0 

XSTATO  -  Transmit  status  bit  0 

IBM-PC  INTERFACE  INTERRUPT  MASK  ADDRESS  3 

PAL  FILES: 

PAL  DEVICE: 


The  interrupt  is  masked  when  the  bit  is  set  HIGH  and  enabled  when  set 
LOW.  Upon  power-up  reset  or  IBM-PC  Interface  reset  all  bits  are  set  HIGH. 


MICROCODE  BITS 
control  data 


13 

9 

PREF 

Receive  Empty  Flag 

13 

8 

PRAEF 

Receive  Alrost  Empty  Flag 

13 

1 

PRHF 

Receive  Half  Full  Flag 

13 

6 

PRAFF 

Receive  Almost  Full  Flag 

13 

5 

PRFF 

Receive  Full  Flag 

12 

4 

PXEF 

Transnd-t  Empty  Flag 

12 

3 

PXAEF 

Transmit  Aliwst  Empty  Flag 

12 

2 

PXHF 

Transmit  Half  Full  Flag 

12 

1 

PXAFF 

Transmit  Almost  Full  Flag 

12 

0 

PXFF 

Transmit  Full  Flag 

SERIAL  I/O  INTERFACE  CONTROL 
PAL  FILES: 

PAL  DEVICE: 
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ADDRESS  4 


This  register  contains  all  Serial  I/O  receiver  Interface  controls, 
controls  vhlch  are  common  to  both  the  Serial  I/O  transmit  and  receiver 
Interfaces,  and  controls  that  are  Initialized  during  reset  and  normally  remain 
unchanged  afterwards.  Upon  power-up  reset  all  outputs  are  set  HIGH. 


LOOPEN  -  Receive  and  transmit 
loopback  outputs  enabled  when  HIGH, 
Serial  outputs  \dien  LOW 
SOURCE  -  Selects  the  source 
Interface  when  receiving  data  -  LOW 
Is  IBM-PC,  HIGH  Is  HSIO 
CLRIMT  -  Interrupts  cleared  when  LOW 
ENINT  -  Interrupts  enabled  when  LOW 


MICROCODE  BITS 
control  data 
17  7 


16  6 


15  5 

14  4 

13  3,2,1 


12  0 


XSEL2,  ZSELl,  XSELO 
XSEL2  XSELl 

0  0 

0  0 

0  1 

0  1 

1  0 

1  0 

1  1 

1  1 

RESET  -  Reset  the 


XSELO 

0  HIGH 

1  Receive  FF 

0  Receive  AFF 

1  Receive  HFF 

0  Receive  AEF 

1  Receive  EF 

0  XSTATO 

1  LOW 

Interface  when  LOW 


SERIAL  I/O  IHTERFACE  TRANSMIT  CONTROL  ADDRESS  5 

PAL  FILES: 

PAL  DEVICE: 


This  register  controls  the  operation  of  the  Serial  I/O  Interface.  Upon 
power-up  reset  or  Serial  I/O  Interface  reset  all  bits  are  set  to  HIGH. 


MICROCODE 

BITS 

control 

data 

17 

6 

SXRESET  -  Reset  the  transmit 

Interface 

16 

5 

GO  -  Begins  sending  data  when  LOW 

15 

4 

Selects  ^Icb  counter  Is  assigned  to 
the  SIO  Interface  for  sending  data  - 
LOW  Is  counter  A,  HIGH  Is  counter  B 

14 

3,2 

REAL,  IMAG 

REAL  IMAG 

0  0  64-blt,  low  word  first 

0  1  32-blt,  Imaginary  data 

1  0  32-blt,  real  data 

1  1  64-blt,  high  word  first 

13 

1 

XSTATl  -  Transmit  status  bit  1 

12 

0 

XSTATO  -  Transmit  status  bit  0 
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ADDRESS  6 


SERIAL  I/O  INTERFACE  TRANSMIT  INTERRUPT  MASK 
PAL  FILES: 

PAL  DEVICE; 

The  Interrupt  is  soasked  idien  the  bit  is  set  HIGH  and  enabled  when  set 
LOW.  Upon  power-up  reset  or  Serial  I/O  Interface  reset  all  bits  are  set  HIGH. 

MICROCODE  BITS 
control  data 


13 

9 

PREF 

deceive  Empty  Flag 

13 

8 

PRAEF 

Receive  Almost  Empty  Flag 

13 

7 

PRHF 

Receive  Half  Full  Flag 

13 

6 

PRAFF 

Receive  Almost  Full  Flag 

13 

5 

PRFF 

Receive  Full  Flag 

12 

A 

PXEF 

Transmit  Empty  Flag 

12 

3 

PXAEF 

Transmit  Almost  Empty  Flag 

12 

2 

PXHF 

Transmit  Half  Full  Flag 

12 

1 

PXAFF 

Transmit  Almost  Full  Flag 

12 

0 

PXFF 

Transmit  Full  Flag 

HIGH-SPEED  I/O  INTERFACE  CONTROL  ADDRESS  7 

PAL  FILEf: 

PAL  DEVICE; 

This  register  controls  the  operation  of  the  High-Speed  I/O  (HSIO) 
Interface.  After  reset  all  bits  are  set  to  HIGH. 


MICROCODE  BITS 
control  data 

17  7 

16  6 

15  A, 5 


U  3 

13  2 


12  1 
12  0 


DATA  TRANSFER  COUNTER  A 
bits  19:0 


MEM  -  I/O  HIGH,  Memory  LOW 
WRITE  -  Read  HIGH,  Write  LOW 
SOURCE  -  Selects  the  source 
Interface  when  receiving  data 
BITS  BITA 

0  0  Microprogram  ROM 

0  1  IBM-PC  Interface 

1  0  Serial  I/O  Interface 

1  1  None 

GO  -  Begins  sending  data  when  LOW 

HSELAB  -  Selects  which  counter  is 

assigned  to  the  HSIO  Interface  for 

sending  data.  LOW  is  counter  A,  HIGH 

is  counter  B 

REAL 

IMAG 

REAL  IM/U: 

0  0  6A-bit,  low  word  first 

0  1  32 -bit,  imaginary  data 

1  0  32-blt,  real  data 

1  1  6a-blt,  high  word  first 

ADDRESS  8 

Data  transfer  count  to  load 
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DATA  TRANSFER  COUNTER  B  ADDRESS  9 

bits  19:0  Data  transfer  count  to  load 

MACRO  RAM  ADDRESS  REGISTER  ADDRESS  10 

bits  12:0  Directly  addresses  MACRO  RAM 

MACRO  RAM  ADDRESS  COUNTER  ADDRESS  11 

bits  11:0  Parallel  loads  counter  which  directly 
addresses  MACRO  RAM 

MACRO  RAM  COUNTER  REGISTER  ADDRESS  12 

bits  11:0  May  be  used  to  load  MACRO  RAM  ADDRESS 
COUNTER  at  a  later  time 

CPH  I/O  ADDRESS  COUNTER  ADDRESS  13 

bits  23:0  Addresses  CPH  I/O  and  memory  space 

CPH  I/O  SYSTEM  ADDRESS  REGISTER  ADDRESS  14 

bits  5:0  Used  to  generate  system  address  when 

downloading  microcode  Into  CPH  system(s) 

lOP  CONTROL  REGISTER  0  ADDRESS  15 

PAL  FILES: 

PAL  DEVICE: 

MICROCODE  BITS 
control  data 

12  2,1,0  Interrupt  Mapping  Select 

B1T2  BlTl  BITO 

000  Interrupt  table  0 

001  Interrupt  table  1 

010  Interrupt  table  2 

Oil  Interrupt  table  3 

100  Interrupt  table  4 

101  Interrupt  table  5 

110  Interrupt  table  6 

111  Interrupt  table  7 

3. 2. 4. 2  IQP  Tbaoxy  of  Oporotlon 

SYSTEM  INTERRUPT 

A  system  Interrupt  Indicates  that  one  or  more  boards  In  a  system 
requires  servicing.  The  first  step  Is  to  determine  which  system  generated  the 
Interrupt . 

The  Interrupt  service  routine  must  poll  each  board’s  configuration 
register  bit  0  at  the  board’s  base  I/O  address  +  1  to  determine  If  that  board 
caused  the  Interrupt.  If  this  bit  reads  0  then  that  board  Is  generating  a 
system  Interrupt.  At  this  point  the  action  to  take  place  is  entirely  under 
software  control.  The  only  requirement  in  hardware  Is  that  bit  0  of  base  I/O 
address  1  on  that  board  be  written  to  with  a  1  to  clear  the  system 
Interrupt . 
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lOP  RESET 


Upon  lOP  resets  or  power-up,  if  the  BOOT  RAM/ROM  jumper  is  in  the  RAM 
position.,  a  state  machine  presents  a  WCS  0000  instruction  to  the  ADSP-1401 
microsequencer.  This  places  the  microsequencer  in  the  write  control  store 
mode  and  begins  outputting  addresses  starting  at  OOOOH  counting  upwards .  Code 
is  then  loaded  from  the  host  (selected  by  a  jumper)  into  the  microeequencer 
microcode  RAM.  The  entire  RAM  space  of  0000  to  OFFF  must  be  loaded  with  code 
or  filled  with  IDLE  instructions.  Optionally  ROM  may  be  installed  in  place  of 
RAM  and  the  BOOT  jumper  set  to  ROM  instead  of  RAM.  In  this  case  the  above 
load  is  skipped. 

For  HAM  boot  jumper  the  address  continues  to  increment  now  at  lOOOH. 
For  the  ROM  BOOT  jumper  the  microsequencer  address  is  Initialized  using  the 
WCS  instruction  to  lOOOH.  At  this  point  the  microsequencer  is  no  longer 
loading  its  own  microprogram  memory,  but  is  loading  the  lOP  macroinstruction 
memory.  lOP  macroinstruction  memory  must  again  be  completely  filled  with  code 
or  filler.  This  continues  until  the  microsequencer  hits  address  2000H  \^ere  a 
microsequencer  reset  is  generated  by  the  hardware  beginning  execution  of  the 
code  at  microsequencer  location  OOOOH.  The  code  beginning  at  OOOOH 
initializes  the  microsequencer  and  then  jumps  to  the  lOP  macroinstruction  at 
its  program  counter  address  OOOH  and  continues  from  there. 

The  microsequencer  reset  is  generated  by  the  combination  of  the  BOOT 
state  machine  in  the  BOOT  state  and  the  microsequencer  address  bit  13  high. 
When  this  occurs,  both  the  microsequencer  is  reset  and  the  BOOT  state  machine 
is  placed  in  the  RUN  mode. 

3. 2. 4. 3  lOP  Mlerosaqumncsr 

The  lOP  board  has  an  extensive  and  Independent  microcontroller  to  manage 
the  several  datapaths  among  the  various  EVA  functional  units .  The 
microsequencer  is  depicted  in  Figure  27  ^diere  it  is  shown  that  the  PC  (ISA), 
HSIO,  and  SIO  (VME  Buffer)  are  controlled  by  a  48-bit  microinstruction  as 
tabulated  here. 

Microinstruction  Format 

BITS _ USAGE _ 


7  microinstruction  opcode 

6  conditional  select 

11  literal  data 

16  data  or  relative  jtunp  address 

A  WCS  is  used  for  downloading  lOP  command  sequences  from  the  host 
computer.  The  All  counter  (CHTR)  may  be  used  for  loops.  AlO  and  A12  are 
additional  address  select  registers  for  the  sequencer  where  each  may  be 
assigned  to  the  three  external  datapaths  (PC, HSIO, SIO)  for  controlling  the 
next  sequence.  The  Analog  Devices  ADSP-1401  microsequencer  chip  has  been 
selected  because  it  supports  Interrupts,  nested  loops,  and  a  stack.  Booting 
up  the  1401  requires  us  to  put  address  20H  onto  the  sequencer  program  counter. 
This  will  always  be  the  starting  address  for  RESET  as  well. 
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80 


The  lOP  can  detect  the  arithmetic  status  of  the  CPH  ALUs.  With  this 
input  via  the  condition  code  select  IflJX,  the  TOP  can  jump  to  error  handling 
routines  as  needed.  Both  a  flag  PAL  and  a  HAP  PAL  support  future 
modifications  to  the  lOP  ^en  device  upgrades  and  subsequent  address  MAP 
changes  are  needed.  The  previous  lOP  sections  have  described  the  control 
register  functions  and  the  control  signals  which  activate  the  datapaths 
through  this  ZOP  board.  Once  the  lOP  has  served  as  the  traffic  director  of 
the  EVA,  execution  of  code  begins  automatically  and  continues  until  the  lOP 
detects  a  flag  set  on  any  of  the  EVA  boards.  A  set  flag  denotes  some  action 
required  of  the  lOP,  such  as  "more  data,  computation  done,  or  error 
condition". 

OPERATION  -  BOOT 

On  power-up  the  microsequencer  on  the  lOP  board  contains  no 
instructions.  The  BOOT  state  machine  controls  the  board  at  this  point, 
enabling  a  path  from  the  host  Interface  (either  the  PC  or  SIO  Interface, 
^Ichever  is  programmed  into  the  PAL)  to  the  ADSP-1401  microsequencer’s 
microprogram  RAM.  It  also  performs  handshaking  with  the  microsequencer’s  FLAG 
Input  and  the  host  Interface’s  FIFORD  PAL  to  control  the  timing  between  the 
two,  and  loads  the  WCS  OOOH  Instruction  Into  the  microsequencer.  The 
microprogram  RAM  Is  8k  48-blt  words  long,  and  the  BOOT  state  machine  will  load 
the  first  8k  64-blt  words  of  data  appearing  at  the  host  Interface  Into  the 
RAM,  discarding  the  upper  16-blts  of  each  word.  At  this  point,  the  BOOT  state 
machine  resets  the  microsequencer  causing  it  to  start  executing  code  at 
address  OOOH.  This  boot  code  Is  required  to  start  with  a  CONT  Instruction. 
The  remaining  boot  code  will  load  the  MACRO  RAM.  The  MACRO  RAM  performs  the 
high-level  Instruction  execution.  It  may  be  thought  of  as  a  sequence  of 
subroutine  calls  to  the  microsequencer.  The  MACRO  RAM  Is  8k  16-blt8  words 
long  although  only  the  bottom  half  will  be  used  for  MACRO  instructions.  The 
top  4k  words  will  be  used  to  store  configuration  data,  etc.  The  boot  code 
will  expect  the  first  Instruction  to  appear  at  the  host  Interface  to  be  a 
LOMACRO  idilch  will  contain  a  starting  address,  and  the  number  of  16-blt  data 
to  be  loaded.  The  upper  48-blt8  of  each  64-blt  data  word  from  the  interface 
will  be  discarded.  To  expedite  Initial  CPH  tests,  since  the  configuration  of 
the  system  will  be  known,  the  configuration  data  \dilch  would  normally  be  read 
from  each  of  the  boards  upon  reset  may  be  loaded  from  the  host  and  progrcunsed 
directly  Into  the  upper  MACRO  RAM.  At  this  point  all  downloading  has  been 
coiiq>leted,  and  normal  operation  Is  to  begin.  All  Interaction  between  the 
Interfaces  and  the  microsequencer  are  done  under  Interrupt  control.  The 
microsequencer  boot  code  Initializes  the  interrupt  table  as  follows: 


IRQ8  IBM-PC  Receive  NEF  (HOST) 

IRQ?  SYSTEM  INT  0  (CPH) 

IRQ6  SIO  Receive  HEF  (VPH) 

IRQ5  IBM-PC  STATl  (HOST) 

IRQ4  SIO  STATl 
IRQ3 

IRQ2  COUNTER  A  ZERO 
IRQl  COUNTER  B  ZERO 


The  boot  code  also  reconfigures  the  Interfaces  If  desired,  such  as 
Increasing  the  clock  rate  from  the  Initial  low  rate  It  defaults  to  on  power- 
up. 
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3. 2*4. 4  Proc««aor-to-I/0  Procasaor  Coaoualeatlon  Protocol 

The  Processor-to-I/O  Processor  communication  protocol  Is  as  follows. 
Three  single-bit  registers  will  exist  for  each  bank  of  cache  memory:  BUSY, 
INT,  and  LOCK.  The  BUSY  register  Is  used  by  the  Processor  to  Indicate  to  the 
I/O  Processor  (TOP)  that  It  Is  currently  accessing  that  memory  bank,  the  INT 
register  will  Inform  the  lOP  when  the  Processor  Is  finished  with  that  bank, 
and  the  LOCK  register  will  prevent  the  Processor  from  accessing  that  bank 
until  the  lOP  Is  finished.  An  example  utilizing  these  registers  Is  given  from 
the  viewpoint  of  first  the  Processor,  and  then  the  lOP. 

FBOCSSSORi  The  Processor  examines  the  INT  and  LOCK  bit  and  If  both  are 
Inactive,  sets  the  BUSY  bit  and  begins  processing  that  baxdc  of  memory.  If  the 
INT  bit  or  the  LOCK  bit  were  active.  It  has  to  wait  until  both  are  Inactive 
before  setting  the  BUSY  bit  and  processing  the  data.  Once  the  Processor  has 
completed  its  processing.  It  sets  the  INT  bit. 

lOPi  The  lOP  examines  the  BUSY  bit  and  If  Inactive,  sets  the  LOCK  bit 
active.  It  then  reexamines  the  BUSY  bit  and  If  still  Inactive,  It  begins 
transferring  the  data.  At  completion  of  the  data  transfer,  the  INT  bit  Is 
cleared.  If  when  the  lOP  reexamines  the  BUSY  bit.  It  Is  suddenly  found  to  be 
active,  the  LOCK  bit  Is  Immediately  set  to  Inactive  assuming  that  the 
Processor  has  taken  control  of  the  memory  bank  during  the  time  It  took  the  lOP 
to  set  the  LOCK  bit.  The  Processor  always  has  priority.  If  upon  the  Initial 
examination  the  busy  bit  was  active,  the  lOP  must  either  use  another  memory 
bank  or  wait  until  the  BUSY  one  generates  an  INT  and  the  data  Is  transferred 
out. 


In  addition,  in  order  to  prevent  the  lOP  from  having  to  read  the  LOCK 
register,  OR  or  AND  one  bit,  and  write  th*,  LOCK  register  back,  logic  should  be 
Incorporated  Into  the  memory  boards  to  accomplish  these  tasks.  One  method 
would  be  to  have  four  register  address  bits  to  select  which  of  sixteen  bits 
will  be  changed,  and  one  register  control  bit  to  indicate  If  the  bit  should  be 
set  or  cleared. 

The  memory  BUSY  register  and  INT  register  must  also  be  added  to  the 
High-Speed  I/O  (HSIO)  bus  memory  address  space,  probably  by  utilizing  the 
unused  bank  address  7. 

3.2.5  VFB/CFB  VIS  Buffmr 

The  VME  buffer  board  Is  the  primary  linkage  between  the  CPH  and  the  VPH. 
This,  however.  Is  not  Its  only  function.  When  operating  apart  from  the  VPH, 
the  CPH  can  use  the  VME  buffer  board  to  connect  to  a  6U  VME  backplane.  When 
used  with  the  VPH,  the  VME  buffer  board  plugs  Into  the  VPH  backplane  directly. 
This  board  also  Incorporates  the  augmented  interface  for  the  VPH  so  that 
parallel  64-blt  data  transfers  between  It  and  the  CPH  can  take  place.  The 
board  Is  completely  fabricated  but  untested  as  yet.  A  schematic  has  been 
created  for  the  board  and  Is  titled  Serial  10  board.  As  the  board  Is 
basically  a  gateway  for  the  VPH  and  CPH,  the  majority  of  the  circuits  are 
transceivers  and  Pi^s  for  controlling  activity.  The  subsequent  state  machine 
design  Is  basic.  The  major  feature  of  this  board  is  the  Gazelle  hot  rod  GaAs 
chips  to  maintain  the  80  MHz  throughput  between  the  CPH  and  VPH. 
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3.2.5. 1  Puxposa 


This  VMS  buffer  board  floorplan  sbovn  In  Figure  28  Is  designed  to  serve 
as  a  high-speed  Interface  between  the  VFH  Processor  Board  (designed  for  the 
VMS  bus)  and  the  CPH’s  I/O  Processor  Board  \dilch  connects  to  a  proprietary 
backplane.  The  goal  of  this  board  Is  to  link  the  two  systems  In  an  efficient 
manner  to  maximize  data  bandwidth  and  to  minimize  the  amount  of  I/O  necessary 
to  control  the  data  transfers.  This  board  should  accept  data  from  both  the 
VME  bus  (data  width  4  bytes  at  10  MHz)  as  well  as  the  proprietary  32-blt  data 
connector  which  connects  directly  to  the  VPH  board.  Since  this  extra  32-blt 
data  connector  Is  synchronized  to  the  VMS  data  transfer  bus.  It  also  transfers 
4  bytes  at  10  MHz  for  a  total  data  transfer  rate  of  8  bytes  at  10  MHz  or  80 
MBytes/sec  between  the  VPH  and  Serial  I/O  Board.  Actual  performance  Is 
estimated  to  he  approximately  67  MBytes/sec  assuming  an  Immediate  response 
from  the  VPH  to  DTACR  (Data  Transfer  Acknowledge).  Faster  rates  may  be 
obtainable  by  fine-tuning  the  Serial  I/O  Board’s  DTA(X  timing  for  both  reads 
and  writes  once  the  boards  are  Integrated  Into  a  system  and  actual  timing 
measurements  may  be  taken.  Dlpswltches  have  been  designed  In  so  that  the 
DTACK  timing  may  be  adjusted  Individually  for  both  reads  and  writes  from/to 
the  FIFOs  In  10  nanosecond  Increments.  Depending  on  the  antount  of  the  change, 
the  FIFOHD  and/or  FIFOWR  PALs  may  also  need  to  be  reprogrammed. 

At  the  serial  Interface,  Gazelle  HOT  ROD  ICs  have  been  used  which  can 
transfer  data  serially  at  a  rate  of  500  Mblts/sec  or  62.5  MBytes/sec.  The 
actual  serial  baud  rate  Is  625  MHz  due  to  the  4-to-5  bit  encoding  scheme  used. 
These  bits  are  Invisible  due  to  their  being  Inserted  at  the  transmitter  and 
stripped  at  the  receiver.  Data  to  the  HOT  ROD  ICs  Is  presented  40-blts  at  a 
time.  32  bits  are  data,  4  bits  are  parity,  and  4  bits  are  control.  These  40 
bits  are  latched  at  a  12.5  MHz  rate.  Since  only  32  of  the  bits  are  data,  the 
actual  data  transfer  rate  calculates  out  to  be  50  MBytes/sec.  If  this  rate 
Isn’t  fast  enough.  Gazelle  also  makes  800  Mblt/sec  and  will  soon  make  1000 
Mbit /sec  ICs  which  should  be  Interchangeable  with  the  ICs  now  In  the  design, 
as  long  as  the  PALs  which  control  them  are  suitably  fast.  Faster  Gazelle  ICs 
would  also  mean  faster  FIFOs  must  be  used.  Only  one  speed  upgrade  Is 
currently  available  from  that  which  Is  already  being  used.  35  nsec  FIFOs  are 
now  being  used  whereas  25  nsec  are  the  fastest  available  at  this  time,  and  are 
significantly  more  expensive.  Faster  FIFOs  may  also  bring  the  VMS  data 
transfer  rate  up  to  Its  maximum  of  80  Mbytes /sec  (Including  the  proprietary 
32-blt  data  connector) .  The  Gazelle  ICs  directly  drive  50-ohm  coax  cable  for 
short  distances.  For  longer  distances.  It  Is  suggested  that  an  amplifier  be 
used  for  single-ended  operation  or  that  fiber-optic  cable  be  used. 

3. 2. 5. 2  VIS  Buffer  Board  Bus  Limitations 

The  VMS  buffer  board  uses  a  subset  of  the  VME  standard  bus  because  the 
board  functions  only  as  a  special  augmented  Interface  to  the  VPH.  The  board 
transfers  the  upper  32  data  bits  so  that  a  64-blt  parallel  bus  couples  the  VPH 
and  CPH.  It  has  VME  limitations  now  described. 
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Ilgur.  28.  VIS  Btt££«r  Board  Vlooxplaa 
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Data  Transfer  Bus 


-  BEKR*  is  not  supported  since  all  addresses  are  occupied.  The  only 
Illegal  board  accesses  are: 

1 .  Attempt  to  write  to  the  Interrupt  Status  Register 

2.  Attempt  to  read  from  the  Interrupt  Status  ID  Register 

This  board  does  not  support  DI6:BLT  nor  D08(E0):BLT  Double-  nor  Single-byte 
block  transfers.  When  the  board  Is  configured  without  the  extended  32-blts  of 
data  FIFO,  It  accepts  all  Quad-byte,  Double-byte,  and  Single-byte  data  reads 
and  writes.  When  configured  with  extended  32-blts  of  data  FIFO,  It  supports 
proprietary  Octal-byte  reads  and  writes,  although  it  appears  to  the  VME  bus  as 
a  Quad-byte  transfer  (D32:BLT).  When  the  board  Is  configured  without  the 
extended  32-blts  of  data  FIFO,  it  accepts  Quad-byte  Block  Transfers.  When 
configured  with  extended  32-blts  of  data  FIFO,  It  supports  proprietary  Octal- 
byte  block  transfers,  although  It  appears  to  the  VME  bus  as  a  Quad-byte  block 
transfer. 

This  board  does  not  support  RMW  (read-modlfy-wrlte)  slowly  because 
reading  and  writing  Is  done  from  a  separate  FIFOs.  When  the  board  Is 
configured  without  the  extended  32-bits  of  data  FIFO,  It  accepts  Triple-byte 
reads  and  writes.  Its  Priority  Interrupt  Bus  has  the  signals,  1(1),  1(2), 
1(3),  1(4),  1(5),  1(6),  1(7),  and  can  generate  an  Interrupt  on  any  of  the 
seven  Interrupt  request  lines  IRQl*  through  IRQ7*. 

The  VME  signal,  D08(0),  drives  D00-D07  In  response  to  a  valid  8-blt,  16- 
blt,  or  32-blt  Interrupt  Acknowledge  cycle.  Release  On  Acknowledge 
Interrupter  type  (ROAR)  Is  an  Interrupt  request  to  be  released  upon  a  status 
ID  register  read. 

3. 2. 5. 3  Control  lagletore  of  the  VIS  Buffor  Board 

The  board  contains  control.  Interrupt  status.  Interrupt  mask,  and 
Interrupt  status-ID  registers.  Their  addresses  and  bit  definitions  are  as 
follows : 


CONTROL  REGISTER 

All  bits  are  active  high  and  are  reset  to  zero  on  power-up  or  VME  system 

reset. 


Bit 

Name 

Description 

0 

IRESET 

Reset  Latched  Interrupts 

1 

FRESET 

Reset  FIFOs 

2 

ENINT 

Enable  VME  Interrupts  (DEFAULT:  Interrupts  disabled) . 

3 

IHTSELl  { 

1  Selects  which  VME  Interrupt  Request  Line  is 

4 

INTSEL2  1 

1  pulled  low  ^en  an  on-board  Interrupt  Is 

5 

INTSEL3  I 

1  generated  (DEFAULT:  000,  no  interrupt  selected). 

6 

DT 

Data  Type  0  Standard  32-blt  VME  data  (DEFAULT) 

1  Extended  to  Include  32-blt  proprietary 

7 

SWINT 

Software  Interrupt 

8 

RLOOPEN 

Enable  Receiver  L  Input  (DEFAULT:  S  Input) 
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9 

ZLOOPEN 

Enable  Transmitter  L  Output  (DEFAULT:  S  Output) 

10 

CXSTATO 

Control  Transmit  Status  Bit  0  (DEFAULT:  LOW) 

NOTE:  This  signal  Is  inverted  prior  to  being 
transmitted . 

11 

XSTATl 

Transmit  Status  Bit  1 

12 

XSELO  1 

Transmitter  Control  Bit  0  Source  Address 

13 

XSELl 

(DEFAULT:  000) 

lA 

15 

XSEL2  1  (see  table  below) 

TRANSMITTER  CONTROL  BIT  0  SOURCE  ADDRESS 

Address 

Control  Bit  0  Transmitted 

0 

LOW 

Always  LOW  (DEFAULT) 

1 

RFF 

Receiver  Full  Flag 

2 

RAFF 

Receiver  Almost-Full  Flag 

3 

RHFF 

Receiver  Half -Full  Flag 

A 

RAEF 

Receiver  Almost-Empty  Flag 

5 

REF 

Receiver  Empty  Flag 

6 

CXSTATO 

Control  Transmit  Bit  0 

7 

HIGH 

Always  HIGH 

INTERRUPT  STATUS  REGISTER 
Description 

Transmitter  FIFO  Full  Flag 
Transmitter  FIFO  Almost-Full  Flag 
Transmitter  Half-Full  Flag 
Transmitter  Almost-Empty  Flag 
Transmitter  Empty  Flag 
Receiver  FIFO  Full  Flag 
Receiver  FIFO  Almost-Full  Flag 
Receiver  Half-Full  Flag 
Receiver  Almost-Empty  Flag 
Receiver  Empty  Flag 
Parity  Error 
Receiver  Status  Bit  0 
Receiver  Status  Bit  1 
Receiver  Data  Error 
Software  Interrupt 
15 


Bit 

Name 

0 

XFF 

1 

XAFF 

2 

XHFF 

3 

XAEF 

A 

XEF 

5 

RFF 

6 

RAFF 

7 

RHFF 

8 

RAEF 

9 

REF 

10 

PARITY 

11 

RSTATO 

12 

RSTATl 

13 

RECERR 

lA 

SWINT 
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INTERRUPT  MASK  REGISTER 


All  bits  are  active  high  and  are  reset  to  one  on  power-up  or  VME  system 
reset  (all  Interrupts  are  Initially  masked). 


Bit 

Name 

Description 

0 

XFF 

Transmitter  FIFO  Full  Flag  Mask 

1 

XAFF 

Transmitter  FIFO  Almost-Full  Flag  Mask 

2 

XHFF 

Transmitter  Half -Full  Flag  Mask 

3 

XAEF 

Transmitter  Almost-Empty  Flag  Mask 

A 

KEF 

Transmitter  Empty  Flag  Mask 

5 

RFF 

Receiver  FIFO  Full  Flag  Mask 

6 

RAFF 

Receiver  FIFO  Almost-Full  Flag  Mask 

7 

RHFF 

Receiver  Half -Full  Flag  Mask 

8 

RAEF 

Receiver  Almost-Empty  Flag  Mask 

9 

REF 

Receiver  Empty  Flag  Mask 

10 

PARITY 

Parity  Error  Mask 

11 

RSTATO 

Receiver  Status  Bit  0  Mask 

12 

RSTATl 

Receiver  Status  Bit  1  Mask 

13 

RECERR 

Receiver  Data  Error  Mask 

lA 

SWINT 

Software  Interrupt  Mask 

15 

INTERRUPT  STATUS  ID 

This  Is  simply  an  8-blt  register  lAlch  Is  written  to  by  a  VME  bus 
master.  During  an  Interrupt  Acknowledge  cycle,  the  contents  of  this  register 
Is  placed  onto  the  VME  data  transfer  bus  In  response  to  a  valid  lACKIN 
address . 


REGISTER  ADDRESSES 


Register 


Read/Write  Address  Offset 


CONTROL  REGISTER  R/W 

INTERRUPT  STATUS  R 

INTERRUPT  MASK  R/W 

INTERRUPT  STATUS  ID  W 


lOOh 

104h 

108h 

lOCh 


3*2. 5. 4  Addraes  Salaet  on  tho  VW  Bnffor  Board 


Three  8-posltlon  dlpswltches  reside  on  the  board  for  selecting  both  the 
FIFO  address  as  well  as  the  register  address  block.  These  two  blocks  must  be 
contiguous  with  the  FIFO  block  residing  In  the  lowest  256-byte  block  and  the 
registers  In  the  upper.  Neither  the  addressing  for  the  FIFOs  nor  for  the 
registers  Is  fully  decoded,  leading  to  address  foldover.  The  FIFO's  respond 
to  any  address  within  their  256-byte  block,  and  the  registers  each  respond  to 
sixteen  different  locations  (they  Ignore  the  upper  4  address  bits  of  the 
lowest  byte) . 

The  three  dlpswltches  are: 


SI  address  bits  A31  -  A2A 
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52  address  bits  A23  -  A16 

53  address  bits  A15  -  A09 

For  each  dipswltch  OPEN  represents  a  HIGH,  CLOSED  represents  a  LOW. 
Position  1  represents  the  most  significant  bit  of  that  address  byte,  with 
position  8  representing  the  least. 

3. 2. 5. 5  VIB  Buffer  Board  Intarropts 

The  board  may  generate  an  Interrupt  to  any  of  the  following  conditions: 


0 

IFF 

Transmitter  FIFO  Full  Flag 

1 

XAFF 

Transmitter  FIFO  Almost-Full  Flag 

2 

XHFF 

Transmitter  Half-Full  Flag 

3 

ZAEF 

Transmitter  Almost-Empty  Flag 

4 

KEF 

Transmitter  Empty  Flag 

5 

RFF 

Receiver  FIFO  Full  Flag 

6 

RAFF 

Receiver  FIFO  Almost-Full  Flag 

7 

RHFF 

Receiver  Half -Full  Flag 

8 

RAEF 

Receiver  Almost-Empty  Flag 

9 

REF 

Receiver  Empty  Flag 

10 

PARITY 

Parity  Error 

11 

RSTATO 

Receiver  Status  Bit  0 

12 

RSTATl 

Receiver  Status  Bit  1 

13 

RECERR 

Receiver  Data  Error 

14 

15 

SWINT 

Software  Interrupt 

All  of  the  above  signals  are  active  low.  When  active,  a  rising  edge  on 
the  25  HHz  clock  latches  them  Into  their  respective  INTR4  PALs  (U62-U65), 
causing  the  INTR4  PAL  to  output  a  low  on  Its  INT  output.  The  VMEINTSL  PAL 
(U66),  upon  detecting  one  or  more  of  its  INTx  Inputs  low,  generates  a  high  on 
the  IRQy  output  that  Is  addressed  by  the  SELy  Inputs,  and  also  a  low  on  Its 
INT  output.  The  SELy  Inputs  are  programmable  In  the  CONTROL  REGISTER  (U34) 
and  select  which  VME  Interrupt  request  line  la  being  used  by  the  board.  The 
CONTROL  REGISTER  ENINT  (Enable  Interrupt)  bit  must  be  set  to  one  to  enable  the 
VME  interrupt  request  open-collector  drivers  (U39). 

RESPONDING  TO  INTERRUPT  ACKNOWLEDGE  DAISY-CHAIN  INPUT 

Upon  detecting  a  low  signal  on  its  lACKIN  Input,  the  VMEIACK  PAL  (U67) 
sees  If  three  conditions  are  met  prior  to  responding.  First,  Its  INT  Input 
must  be  low  Indicating  an  on-board  interrupt  is  pending.  Secondly,  the  ENINT 
Input  must  be  high  Indicating  that  Interrupts  are  enabled.  And  thirdly,  the 
address  received  on  the  AOl,  A02,  and  A03  inputs  must  match  those  on  the  SELO, 
SELl,  and  SEL2  Inputs  (and  must  not  be  0).  If  all  of  these  conditions  are 
met,  then  the  IDEN  output  Is  set  to  active  low,  else  the  lACKOUT  output  Is  set 
to  active  low  passing  along  the  Interrupt  acknowledge  to  the  next  board  in  the 
system.  If  IDEN  is  set  low,  this  signal  Is  passed  to  the  Status  ID  register 
(U36)  OERB  (output  enable  read-back)  control  Input  causing  the  register  to 
output  Its  contents  onto  the  data  bus.  IDEN  also  connects  to  the  MUXCTRL  PAL 
(D51)  enabling  the  VME  bus  transceivers  (U4-U11). 
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3.2.6  PC  Intarfaca  Board 


The  primary  code  development  Interface  to  EVA  Is  via  a  PC  Interface 
board  (PC-INT)  shown  in  Figure  29.  Space  Tech’s  high-speed  PC  Interface  Is 
designed  for  versatile  Interfacing  to  virtually  any  type  of  PC  outboard 
hardware.  The  Interface  Is  symmetric;  that  Is,  the  two  "ends"  of  the 
Interface  circuitry  are  Identical  with  the  exception  of  glue  logic  tying  the 
Interface  to  the  local  environment.  This  Interface  Is  a  bidirectional 
Interface.  Interconnect  Is  done  via  twisted  pair  cable.  RS-422 
drivers /receivers  are  used  to  ensure  noise  Immunity  and  allow  high  throughput; 
well-written  drivers  should  allow  this  Interface  to  handle  data  transfers  at 
the  full  ISA  bus  data  rate.  An  architectural /functional  description  of  the 
Interface  as  It  appears  to  the  PC/AT  system  follows. 

The  Interface  Is  accessed  In  the  PC’s  1/0  Address  Space  (as  opposed  to 
its  Memory  Address  Space),  and  It  occupies  a  4-byte  section  of  this  space. 
The  base  address  at  which  the  Interface  resides  Is  selectable  via  an  8-pole 
dlpswltch  on  the  Interface  board.  It  would  be  desirable  that  driver  software 
can  be  configured  to  look  for  the  Interface  at  any  address  within  the  I/O 
Space  dedicated  to  slave  add-ons  (the  first  256  locations  are  dedicated  to  the 
platform  Itself,  the  next  768  locations  are  available  for  slave  cards). 

The  Interface  Is  a  16-blt  resource  whose  base  address  must  be  a  multiple 
of  4.  The  least  significant  address  bit  will  always  be  0,  since  the  board  Is 
a  16-blt  device.  Two  addresses  -  the  base  address  and  the  base  address  plus 
two  -  access  different  resources  on  the  Interface.  These  resources  are: 

Read  FIFO 
Write  FIFO 
Control  Register 
Status  Register 
Interrupt  Mask  Register 
Interrupt  Register 

The  Read  and  Write  FIFOs  are  where  input  and  output  data,  respectively, 
are  queued  up  as  they  pass  to  and  from  the  board.  The  FIFOs  share  an  address; 
the  cycle  type  (READ  or  WRITE)  determines  which  FIFO  Is  accessed.  The  Control 
register  is  a  wrlte-only  location.  Bits  within  this  register  determine  the 
rate  at  which  data  is  clocked  across  the  Interconnect,  enable /disable  of  the 
FIFOs,  enable /disable  and  set  the  sense  of  parity  checking,  enable /disable  and 
clearing  of  Interrupts,  select  whether  an  access  to  the  FIFO /Interrupt  Mask 
Register  location  Is  destined  for  the  FIFOs  or  the  Interrupt  Registers,  and 
setting  the  Interrupt  level  passed  on  to  the  PC  in  response  to  a  valid 
Interrupt  condition.  Two  additional  bits  are  multipurpose,  undedlcated 
Interface  lines  which  travel  directly  across  the  Interface  without  passing 
through  the  Write  FIFO.  (These  two  bits  appear  as  two  bits  in  the  Status 
Register  at  the  opposite  end  of  the  Interface.) 

The  Status  Register  is  a  read-only  location  (address  coincident  with  the 
Control  Register)  which  provides  access  to  status  flags  for  the  FIFOs.  Both 
Read  and  Write  FIFO  flags  may  be  observed  via  the  Status  Register.  These 
flags  are  Full,  Almost  Full,  Almost  Empty,  .md  Empty.  Another  bit  Indicates 
that  a  parity  error  has  been  detected.  Two  additional  bits  are  a  direct 
reflection  of  the  two  multipurpose  bits  from  the  Control  Register  at  the 
opposite  end. 
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flgor*  29.  PC  Intarfaes  Board  Layout 
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The  Interrupt  Mask  Register  Is  a  read/write  location  vdilch  provides  a 
means  of  selectively  generating  a  PC  Interrupt  based  on  the  conditions  of  the 
FIFO  flags,  the  Parity  bit  In  the  Statue  Register,  or  the  assertion  of  either 
of  the  multipurpose  bits  from  the  Status  Register.  A  READ  of  the  Interrupt 
Register  provides  a  "snapshot"  of  the  current  Interrupt  conditions  %riilch  have 
occurred  since  the  last  clearing  of  the  Interrupt  Register.  (This  provides  a 
means  of  determining  vhat  t3rpe  of  service  Is  required  when  more  than  a  single 
condition  may  cause  an  Interrupt.)  The  location  of  the  Interrupt  Mask 
Register  and  the  Interrupt  Register  Is  coincident  with  the  Read  and  Write 
FIFOs;  a  bit  In  the  Control  Register  determines  whether  an  access  to  this 
location  Is  destined  for  the  FIFOs  or  the  Interrupt  Registers. 

A  description  of  each  of  the  registers  and  the  bits  they  contain 
follows . 

CONTROL  REGISTER 


X|X|I|I|C|E|S|0|C|C|C|R|R|S|S|S| 
1  iRlRlLlNlElDlLlULlElElElTlTj 
I  |Q|Q|R|I|T|D|K|K|K|S|C1N|A|A1 
1  I  I  |I|N|M|*|  I  I  |E|E|D|T|T| 
I  |1|2|N|T|A|/|2|1|0|T|I|  I  I  I 
I  I  I  |T|  |S|E|  I  M  |V|  |1|0| 

I  I  I  1*1  |K|V|  I  M  |E|  I  I  I 

I  I  I  I  I  I  |E|  I  I  I  I  I  I  I  i 

I  I  I  I  I  I  |NM  I  I  I  I  I  I  I 

lilililiiiii  rn  i  rrriTl 

5|413|2|1|0|9|8|7|6|5|A|3|2|1|01 


Bits  15  and  lA  are  not  used,  so  are  don’t  cares  when  writing  the 
register. 

Bits  13  and  12  determine  which  PC  Interrupt  Is  asserted  when  a  valid 
Interrupt  condition  exists  and  Interrupts  are  enabled.  For: 

Bit  13  Bit  12  Interrupt  selected 


0  0  IRQIO 

0  1  IRQll 

1  0  IRqi2 

1  1  IRQ15 


11,  when  asserted,  clears  all  Interrupt  flags.  Ale  wMle  this  bit 
la  asserted  all  Interrupts  are  disabled,  so  to  clear  in^-  .upts  but  not 
disable  them,  this  register  must  be  written  to  twice  -  first  with  Bit  11-0 
then  with  Bit  11-1. 


Bit  10,  when  asserted,  enables  generation  of  Interrupts.  This  Is  the 
Intended  method  of  enabllng/dlsabllng  Interrupts!  If  Bit  10  Is  negated. 
Interrupts  will  not  be  generated,  but  the  Interrupt  Register  will  still  be 
updated  as  valid  Interrupt  conditions  occur.  If  Bit  11  Is  asserted.  Interrupt 
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flags  will  NOT  be  updated  and  a  valid  interrupt  condition  will  then  be  lost. 

Bit  9  determines  whether  an  access  to  the  FIFO/ Interrupt  Mask  Register 
address  will  be  directed  to  the  FIFOs  or  the  Interrupt  Registers.  When  the 
bit  is  asserted  (-1),  an  access  is  directed  to  the  Interrupt  Registers. 

Bit  8  determines  the  sense  of  parity  sense.  Bit  8  “  0  selects  odd 
parity,  and  1  selects  even  parity. 

Bits  7,  6,  and  5  select  the  clock  rate  used  to  clock  data  across  the 
interface.  The  value  of  these  bits  determines  the  division  applied  to  the 
local  clock  which  runs  at  16  MHz.  The  values  and  corresponding  division 
factors  are: 


CLK2 I CLKl 1 CLKO \ Divisor 


0 

1 

0  1 

0 

1 

32 

0 

1 

0  1 

1 

1 

16 

0 

1 

1  1 

0 

1 

8 

0 

1 

1  1 

1 

1 

4 

1 

1 

^  1 

0 

1 

2 

1 

1 

X  1 

1 

1 

1 

Bit  4  is  the  interface  reset  bit.  A  1  written  to  this  bit  causes  all 
FIFOs  to  be  cleared  and  zeroes  to  be  written  to  all  bits  of  all  registers. 
(This  causes  the  bit  to  self  clear.) 

Bit  3  is  the  enable  bit  for  the  receive  (READ)  FIFO.  A  0  written  to 
this  bit  prevents  the  READ  FIFO  from  receiving  any  new  data  across  the 
Interface,  but  does  not  prevent  data  already  in  the  FIFO  from  being  read  by 
the  PC. 

Bit  2  is  the  enable  bit  for  the  send  (WRITE)  FIFO.  A  0  written  to  this 
bit  prevents  the  WRITE  FIFO  from  sending  data  out  across  the  Interface,  but 
does  not  prevent  the  PC  from  writing  new  data  to  the  FIFO. 

Bits  1  and  0  are  the  multipurpose  Interface  bits.  These  bits  propagate 
directly  across  the  Interface  and  appear  as  bits  1  and  0  in  the  Status 
Register  at  the  other  end  of  the  Interface.  They  may  be  used  as  Interrupt 
lines,  or  for  idiatever  kind  of  semaphores  may  be  called  for.  These  bits  DO 
NOT  pass  through  the  FIFOs  at  either  end. 
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STATUS  REGISTER 


X|X|X|X|X|R|R|R|R|W|W|W|W|P|SiS| 

I  I  I  I  |F|F|F|F|F|F|F|F|AiT|T| 

I  I  I  I  |A1A|F1E|A1A1F1E|R|A|A| 

I  I  I  I  1F|E|*|*1F|E|*|*1I1T|T| 


I  I  I  I  I  I  I  I  M 

5|A1312|110|9|8|716|5|A|312|1|0| 


Bits  11-15  are  not  used  and  should  be  disregarded  'when  reading  the 
Status  Register. 

Bit  10  Is  the  Read  FIFO  Almost  Full  flag.  A  0  In  this  bit  Indicates 
that  the  READ  FIFO  Is  almost  full. 

Bit  9  -  Read  FIFO  Almost  Empty  flag. 

Bit  8  -  Read  FIFO  Full  flag. 

Bit  7  -  Read  FIFO  Empty  Flag. 

Bit  6  -  Write  FIFO  Almost  Full  flag. 

Bit  5  -  Write  FIFO  Almost  Empty  flag. 

Bit  A  -  Write  FIFO  Full  flag. 

Bit  3  -  Write  FIFO  Empty  flag. 

Bit  2  is  the  parity  error  flag.  A  0  in  this  bit  indicates  that  a  parity 
error  has  occurred. 

Bits  1  and  0  are  a  direct  reflection  of  the  STATl  and  0  bits  from  the 
Control  Register  at  the  opposite  end. 

INTERRUPT  MASK  REGISTER 

The  template  for  the  Interrupt  Mask  Register  is  identical  to  the  Status 
Register.  A  1  In  any  bit  position  of  the  Interrupt  Mask  Register  allows  the 
corresponding  bit  in  the  Status  Register  to  generate  an  Interrupt;  a  0  masks 
It  out. 
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The  addresses  at  which  the  various  Interface  resources  are  located  are 
shown  below. 

Base  Address  (SETMASK  -  0)  -  READ  or  WRITE  FIFO 

Base  Address  (SETMASK  -  1)  -  Interrupt  Mask  Register  (Write) 

or  Interrupt  Register  (Read) 

Base  Address  -t-  2  -  Control  Register  (Write)  or  Status 

Register  (Read) 

A  better  understanding  of  the  register  function  can  be  obtained  by 
reviewing  the  following  pseudo-code  for  testing  2  PC  Interface  boards.  A 
simple  program  Is  suggested. 

PR(X>RAM  1:  Write  16-blt  data  out  to  one  PC-INT  board  and  receive  It  via 
another  PC-IMT  board. 

To  test  this  program.  Install  two  PC-INT  boards  Into  the  PC  and  connect 
the  two  board  connectors  together  so  that  the  output  of  one  board  Is  the  Input 
to  the  other.  The  procedure  Is  to  send  the  main  memory  data  out  one  board  and 
Into  the  other.  Set  the  sending  board’s  base  address  to  340.  Set  the 

receiving  board’s  base  address  to  360.  Configure  these  addresses  with  the 
dlpswltches  on  each  board.  Although  the  FIFOs  are  2k  words  deep,  only  256 
words  are  being  transferred.  Mo  check  for  parity  errors  are  done.  NOTE  I 
Locations  342  and  362  are  control  registers  when  writing  to  them  and  the 
status  register  when  reading  from  them. .  Locations  340  and  360  are  data 
registers  when  bit  9  In  340  and  360  are  cleared.  So  data  Is  then  transferable 
via  locations  340  and  360.  However,  when  bit  9  Is  set  to  1  In  342  and  362, 
then  340  and  360  are  Interrupt  mask  registers  when  writing  to  them  and 

Interrupt  registers  when  reading  from  them. 

The  program  Is  described  In  single  step  manner  only  to  help  you 

understand  the  procedures .  An  actual  program  would  coudilne  several  of  the 
steps  Into  a  single  "load"  assembly  language  Instruction. 

1.  CLBAR  end  HIT  ODQAID  BBGI81KB8  (la  342  and  362) 

aat  cr  4  to  1  In  342  and  362  /reset  bit  In  the  control 

registers,  clears  registers  and 
FIFOs/ 

aaC  cr  7,6,5  to  001  In  342  and  362  /500kps  baud  rate  in  both  boards/ 

2.  HIT  (XmOL  II6I81IE  baaa  addraasaa  342  aad  362  to  naact  tlaa  to  tha 
latarmpt  aaak  raglatar 

aot  bit  9  to  one  In  342  and  362 

/allows  340  and  360  to  write  to  Interrupt 
mask  reg  Instead  of  data  registers/ 

3.  HITHLIZI  Hl'IUDFT  MASK  BI0I8m 

Load  mask  bits  Into  340  (note  that  340  now  writes  to  mask  register 
Instead  of  data  register  because  bit  9  in  the  control  register  was  Just  set  to 
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one.  (Later,  we’ll  clear  this  bit  in  the  control  register  in  order  to  write 
to  the  data  register.) 

••t  bit  6  to  one  in  340 

/the  write  FIFO  will  Interrupt  the  PC 
when  it  is  almost  full/ 


4.  EHABLB  inERBDPTS 


•at  bits  13,12,11,10  in  342  and  clear  hit  9  in  342  so  340  is  a  "data"  register 
now 

/use  IRQ  15  to  Interrupt  PC  when 
write  FIFO  is  almost  full  in  342 
(hence,  stop  transmitting)/ 

•at  bits  1.^,12,11,10  to  1011  in  362  and  clear  bit  9  in  362  so  360  is  a  data 
register 

/use  IRQ  12  to  interrupt  PC  ^en 
read  FIFO  is  almost  full  in  360/ 

5.  HRZIB  DAXA  TO  340  (DATA  PORT)  (If  FIFO  la  ampty  or  almost  aopty,  wrlta  a 
block  <2kaorda) 

Move  16-blt  words  from  main  memory  and  write  each  word  into  address  340. 
Don’t  write  more  than  2k  words,  otherwise  the  FIFO  will  overflow  in  the  board. 


sat  bit  2  of  342  to  1  and  bit  3  to  0 

/location  340  becomes  a 
transmitting  board/ 

sat  bit  3  of  362  to  1  and  bit  2  to  0 

/location  360  becomes  a 
receiving  board/ 


sat  bit  9  of  362  to  1 


sat  bit  10  of  360  to  1 


elaar  bit  9  of  342 


/to  be  able  to  set  interrupt  mask 
into  360  Instead  of  sending 
erroneous  data  out  360/ 


/enables  the  read  FIFO  almost 
full  Interrupt  flag/ 


/340  is  now  a  data  port  again/ 


wrlta  256  16-bit  words  to  340 

raad  bit  4  in  342  and  don’t  write  til  set  (FIFO  is  not  full  if  flag  is  set) 
if  set  write  next  word  and  check  bit  4  (ok  to  send  a  word) 
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6.  MAn  data  flan  360 


elsar  bit  9  of  362  /now  360  Is  a  data  port/ 

read  256  16-blt  words  from  360 

read  bit  7  of  342  before  each  read  and  If  cleared  then  read  the  word 

rmmd  bit  7  of  362  after  each  read.  If  set  stop  reading  and  wait  til  cleared. 

7.  17  mSSUIPTS  OCCOE 

If  IRQ  15  occurs  from  the  transmitting  board  (board  sending  data  out  of 
the  PC),  then  pause  writing  to  340  to  allow  340  to  open  space  In  Its  FIFO  by 
dumping  out  to  360. 

If  IRQ  12  occurs  from  the  receiving  board  (board  sending  data  back  Into 
the  PC),  then  stop  writing  to  340  because  360  Is  almost  full  and  can’t  store 
any  more  data  from  the  transmitting  board. 

3. 2. 6.1  VPH-End  PC  Intmrfaea 

The  PC  Interface  at  the  VPH  end  differs  slightly  from  the  PC  end 
Interface.  The  architecture  Is  essentially  the  same,  but  the  Interface 
resources  are  accessed  a  little  differently  than  at  the  PC  end.  The  resources 
at  the  VPH  end  are  accessed  at  the  following  68020  addresses: 

Interface  Base  Address  -  $24  0000 

Read /Write  FIFOs  -  $24  0000 

Status /Control  Registers  -  $24  0004 

Interrupt  Registers  -  $24  0008 

Accesses  to  all  of  these  resources  are  longword  (32-blt)  accesses, 
although  only  the  lowest  16  bits  are  utilized. 

The  Status,  Interrupt,  and  Interrupt  Mask  Registers  are  Identical  to 
those  at  the  PC  end.  The  Control  Register  Is  slightly  different  due  to  the 
difference  In  local  environments.  The  mapping  of  the  Control  Register  Is  shown 
below. 


Control  Register  -  VPH  end 
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STAT  0  &  1  -  These  are  general  purpose  interface  bits.  A  bit  written  to 
STAT  0  or  1  In  the  Control  Register  appears  as  STAT  0  or  1  in  the  Status 
Register  at  the  other  end  of  the  Interface. 

SEND  -  This  bit  Is  an  enable  for  the  sending  of  data  across  the 
Interface.  A  0  written  to  this  bit  does  not  disable  the  ability  to  write  to 
the  output  FIFO,  but  does  prevent  data  In  the  output  FIFO  from  being  sent 
until  a  1  Is  written  to  this  bit. 

RECEIVE  -  This  bit  Is  an  enable  for  the  receiving  of  data  across  the 
Interface.  A  0  written  to  this  bit  does  not  disable  the  ability  to  read  data 
In  the  FIFO,  hut  does  prevent  the  FIFO  from  receiving  additional  data  until  a 
1  Is  written  to  this  bit. 

RESET  -  A  1  written  to  this  bit  resets  the  entire  Interface.  The  FIFOs 
are  cleared,  zeros  are  written  to  all  bits  of  all  three  registers.  (This 
effectively  clears  the  RESET  command  once  It  has  been  effected.) 

CLK  0,1,2  -  These  bits  set  the  rate  at  which  output  data  Is  clocked 
across  the  Interface. 

0DD*/EVEN  -  This  bit  selects  odd  or  even  parity  across  the  Interface. 

MMSTIO  -  Setting  this  bit  makes  a  high  level  on  the  Incoming  STAT  0  the 
highest  priority  Interrupt,  thus  giving  the  PC  priority  over  any  VME 
Interrupts.  (The  level  of  the  request  as  passed  to  the  68020  Is  set  by  bit 
15.) 


ENINT  -  This  is  an  enable  for  PC  Interrupts. 

CLRINT*  •  A  1  written  to  this  bit  clears  all  PC  Interrupts.  The  bit  does 
not  self-clear,  so  a  0  must  be  written  to  this  bit  after  Interrupts  have  been 
cleared. 

LSEL0,1,2  -  These  bits  set  the  level  of  the  Interrupt  passed  to  the 
68020  In  response  to  a  PC  Interrupt  request.  (A  request  via  the  STAT  0  line 
has  Its  Interrupt  level  set  by  bit  15  rather  than  by  these  three  bits.) 

STOILEV  -  This  bit  determines  the  Interrupt  level  passed  to  the  68020 
(level  3  or  7)  In  response  to  a  PC  Interrupt  request  on  STAT  0. 

Upon  reset,  the  VPH  PC  Interface  wakes  up  with  zeros  In  all  control 
registers.  This  means  that  SEND  and  RECEIVE  are  disabled,  the  lowest  data 
rate  Is  selected,  ODD  parity  Is  Indicated,  NHSTIO  on  the  Incoming  STATO  is 
disabled,  all  Interface-generated  Interrupts  are  disabled,  all  Interrupts  are 
cleared,  the  Interface  Interrupt  level  is  set  to  zero,  and  the  STATO  NMSTIO 
interrupt  level  Is  set  to  3 .  The  Status  and  Interrupt  Mask  Registers  are 
cleared,  as  are  both  FIFOs. 

A  RESET  may  be  effected  by  writing  a  ”1”  to  bit  4  of  the  Control 
Register. 

To  Initialize  the  Interface  after  a  RESET,  the  required  configuration 
must  be  written  to  the  Control  and  Interrupt  Mask  Registers.  The  specifics  of 
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how  the  Interface  Is  configured  depends  upon  a  previously  agreed  upon  protocol 
or  configuration.  At  the  very  least,  the  FIFOs  must  be  enabled. 

Following  are  a  few  guidelines  for  useful  diagnostic  code  which  have 
been  written  for  testing  the  VPH  end  Interface  and  can  be  found  In  the 
appendices . 

Test  Routines  For  PC  Interface 

1.  Have  VPH  write  a  few  words  to  the  Interface,  verify  that  they  are 
received  by  PC  by  reading  PC  end  Status  Register  and  then  reading  and 
verifying  the  received  data. 

2.  Have  VPH  monitor  the  STATO  and  STATl  lines  In  Status  Register.  The 
VPH  should  update  the  STATO  and  STATl  hits  In  the  Control  Register  to  echo 
changes  on  Incoming  STAT  lines.  The  echoed  STAT  values  may  be  monitored  at 
the  PC  end  for  verification. 

3.  Send  several  data  values  to  the  VPH.  The  VPH  performs  some  simple 
manipulation  on  the  data,  and  writes  It  back  to  the  PC  for  verification. 

Once  these  tests  have  been  run.  It  can  be  assumed  that  basic  PC 
interface  operations  are  functional.  More  complex  code  may  then  be  generated 
for  testing  the  various  Interface  generated  Interrupt  capabilities ,  The  PC 
layout  of  the  VPH  side  of  the  PC  Interface  Is  shown  In  Figure  30.  It  Is  a 
mezzanine  board. 

3. 2. 6. 2  10  Cn—end  Proceeaor 

Aa  10  command  processor  (also  called  10  Monitor)  has  been  generated  for 
the  EVA  system.  The  following  list  of  "commands'*  should  contain  all  necessary 
data.  For  each  command,  the  i6-blt  command  word  will  be  passed  first,  followed 
by  any  parameters  required  for  that  command.  The  order  in  which  parameters  are 
passed  Is  the  same  as  the  order  In  which  they  appear  In  this  list. 

Some  of  the  commands  on  this  list  may  need  to  be  duplicated  In  the  user 
software  In  order  to  effect  slightly  different  functionality.  For  Instance, 
the  "transfer  to  VPH  memory"  commands  should  be  able  to  handle  data  idilch  Is 
resident  in  PC  memory,  or  ^Ich  Is  located  In  a  disk  file.  The  "transfer  from 
VPH  memory"  would  be  similar. 
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PCINT2 


PC  TO  VPH  COMMAMDS 


transfer  to  VPH  memory  (word  writes) 
command  word  •  $0001 

parameters:  wordcount  -  16  bit  (this  Is  the  number  of  16-bit  words 
to  be  transferred) 

VPH  starting  address  -  32  bit 

data  t3rpe  -  lower  bits  of  16-blt  word 

$0  ->  32-bit  floating-point 

$1  ->  24-blt  unsigned  Integer  (sent  as  32 

bit  with  MSB  padded  with  zeros) 

$2  •>  24-bit  signed  Integer  (sent  as  32 
bit  with  MSB  padded  with  zeros) 

$3  ->  16-bit  signed  Integer 
$4  “>  program  data  (32-bit) 

output:  none 

NOTE:  data  type  is  Ignored 

transfer  from  VPH  memory  (word  reads) 
command  word  -  $0002 

parameters:  wordcount  -  16  bit  (this  is  the  number  of  16-bit  words 
to  be  transferred) 

VPH  starting  address  -  32  bit 

data  type  -  lower  bits  of  16-blt  word 

$0  ->  32-bit  floating-point 

$1  ->  24-blt  unsigned  Integer  (sent  as  32 

bit  with  MSB  padded  with  zeros) 

$2  ->  24-bit  signed  Integer  (sent  as  32  bit  with 
MSB  padded  with  zeros) 

$3  >>  16-blt  signed  Integer 
$4  >>  program  data  (32-blt) 

output:  the  number  of  16-blt  words  requested  in  wordcount 
NOTE:  data  type  is  Ignored 

request  VME  bus 
command  word  ■■  $0003 

parameters:  none 
output:  none 

relinquish  VME  bus 
comnand  word  $0004 

parameters :  none 
output :  none 

read  OHB  flag 
command  word  •  $0005 

parameters :  none 

output:  one  16-blt  word  (bit  6  is  DHB  bit) 

read  xCSR  (byte  read) 
command  word  $0006 

parameters:  address  -  32-blt 
output:  one  16-blt  word 
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write  xCSR  (byte  write) 
command  word  *  $0007 

parameters:  address  -  32-blt 

value  -  8-bit  (sent  as  16  bit  with  MSB  padded  with  zeros) 
output :  none 

transfer  from  VPH  to  VME 
command  word  *  $0008 

parameters:  nu]id>er  of  words  -  16-blt  (this  is  the  number  of  32-bit 
words  to  transfer) 

VPH  start  address  -  32-blt 
VME  start  address  -  32-blt 
output :  none 

transfer  from  VME  to  VPH 
command  word  ~  $0009 

parameters:  number  of  words  -  16-bit  (this  is  the  number  of  32 -bit 
words  to  transfer) 

VPH  start  address  -  32-blt 
VME  start  address  -  32-bit 
output:  none 


unused 

coiomand  word  ~  $000A 
unused 

command  word  *  $000B 
unused 

command  word  ~  $0000 
unused 

command  word  ~  $000D 
tmused 

command  word  $000E 
unused 

command  word  •  $000F 
unused 

command  word  -  $0010 

peek  into  VPH  memory  (longword  read) 
command  word  ~  $0011 

parameters:  address  to  read  -  32-bit 
output:  one  little  endian  32-blt  word 

poke  into  VPH  memory  (longword  write) 
command  word  -  $0012 

parameters:  address  to  write  -  32-blt 
value  -  32-blt 
output:  none 
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peek  Into  020  register 
comnand  word  -  $0013 

parameters:  register  to  read  -  16-blt 

$0  ->  DO 
$1  ->  D1 
$2  ->  D2 
$3  ->  D3 
$4  ->  D4 
$5  ->  D5 
$6  ->  D6 
$7  ->  D7 
$8  ->  AO 
$9  ->  A1 
$A  »>  A2 
$B  ->  A3 
$C  ->  A4 
$D  ->  A5 
$E  ->  A6 
$F  ->  A7 
$10  ->  PC 
$11  ->  CCR 
$12  ->  SR 
$13  ->  VBR 
$14  ->  SFC 
$15  ->  DFC 
$16  ->  CACR 
$17  ->  CAAR 
$18  ->  USP 
$19  ->  MSP 
$1A  ->  ISP 

output:  one  little  endian  32-blt  word 

poke  Into  020  register 
command  word  -  $0014 

parameters:  register  to  write,  size  -  16-blt 


byte 

word 

longword 

$0000 

K> 

DO 

$0100 

s> 

DO 

$0200 

m> 

DO 

$0001 

«> 

01 

$0101 

m> 

01 

$0201 

■> 

01 

$0002 

«> 

02 

$0102 

K> 

02 

$0202 

•> 

02 

DOTE:  pokes 

$0003 

«> 

03 

$0103 

*> 

03 

$0203 

m> 

03 

to  CCR  i  SR 

$0004 

K> 

04 

$0104 

a> 

04 

$0204 

m> 

04 

are  always 

$0005 

«> 

05 

$0105 

s> 

05 

$0205 

05 

word  opera¬ 

$0006 

•> 

06 

$0106 

m> 

06 

$0206 

m> 

06 

tions.  Pokes 

$0007 

m> 

07 

$0107 

m> 

07 

$0207 

«> 

07 

to  VBR,  SFC, 

$0008 

K> 

AO 

$0108 

s> 

AO 

$0208 

B> 

AO 

DFC,  CACR, 

$0009 

«> 

A1 

$0109 

m> 

A1 

$0209 

A1 

CAAR,  USP, 

$000A 

m> 

A2 

$010A 

•> 

A2 

$020A 

m> 

A2 

HSP,  and  ISP 

$0008 

m> 

A3 

$0108 

■> 

A3 

$0208 

m> 

A3 

are  always 

$000C 

m> 

A4 

$010C 

m> 

A4 

$020C 

m> 

A4 

longword  op¬ 

$0000 

m> 

A5 

$0100 

m> 

A5 

$0200 

m> 

A5 

erations.  The 

$000E 

>> 

A6 

$01 OE 

m> 

A6 

$020E 

m> 

A6 

VPB  comnand 

$OOOF 

■> 

A7 

$010F 

s> 

A7 

$020F 

m> 

A7 

processor  will 

$0010 

■> 

PC 

$0110 

m> 

PC 

$0210 

m> 

PC 

accept  any 

$0011 

■> 

CCR 

$0111 

s> 

CCR 

$0211 

m> 

CCR 

size  for  these 

$0012 

s> 

SR 

$0112 

s> 

SR 

$0212 

s> 

SR 

registers,  but 

$0013 

■> 

VBR 

$0113 

«> 

VBR 

$0213 

m> 

VBR 

will  always 

$0014 

■> 

SFC 

$0114 

•> 

SFC 

$0214 

s> 

SFC 

utilize  the 

$0015 

DFC 

$0115 

B> 

DFC 

$0215 

B> 

DFC 

correct  sizing 

$0016 

■> 

CACR 

$0116 

m> 

CACR 

$0216 

m> 

CACR 

idien  carrying 

$0017 

m> 

CAAR 

$0117 

m> 

CAAR 

$0217 

m> 

CAAR 

out  the  poke. 

$0018 

m> 

tlSP 

$0118 

m> 

USP 

$0018 

9> 

DSP 

$0118 

m> 

USP 

$0218 

m> 

USP 

$0218 

B> 

USP 

$0019 

m> 

MSP 

$0119 

■> 

MSP 

$0219 

•> 

MSP 

$001A 

m> 

ISP 

$01  lA 

«> 

ISP 

$02  lA 

B> 

ISP 

value  - 

32-bit  (only 

the 

lower  byte  or 

byte 

or 

word 

writes) 

output :  none 
reset  VPH 

command  word  •  $0015 

parameters :  none 
output :  none 

reset  PC  Interface 
command  word  ••  $0016 

parameters:  none 
output:  none 


initialize  PC  Interface 
command  word  -  $0017 

parameters:  control  register  value 
output:  none 

set  PC  Interface  interrupt  mask 
command  word  -  $0018 

parameters:  mask  value  -  16-blt 
output:  none 


16-blt 
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read  PC  Interface  status  register 
command  word  -  $0019 

parameters :  none 
output:  one  16-blt  word 

read  PC  Interface  Interrupt  register 
command  word  •  $001 A 

parameters :  none 
output:  one  16-blt  word 

read  VPH  status  latch 
command  word  •  $0018 

parameters :  none 
output:  one  16-blt  word 

write  VPH  status  latch 
command  word  -  $001C 

parameters:  status  latch  value  -  16-blt  bits  0,1  are  status  bits 
bits  A, 5, 6, 7  are  Zoran  1,2, 3, 4  Interrupt  flags 
all  other  bits  are  don’t  cares 
output :  none 

write  Zoran  reset  latch 
command  word  -  $00 ID 

parameters:  reset  latch  value  -  16-blt  bits  0, 1,2,3  are  reset 
flags  for  Zoran  1,2, 3, 4 
output:  none 

load  DSACK  SRAM 
command  word  *  $00 IE 

parameters:  address  -  32-bit  the  vector  A[31,24..18]  addresses 
the  SRAM;  all  other  bits  are  don’t  cares  value  -  16-bit  the  lowest 
nibble  goes  Into  SRAM;  all  other  bits  are  don’t  cares 
output:  none 

execute  starting  at  address 
command  word  -  $00 IF 

parameters:  start  address  -  32-blt  (enter  LSW  first) 
output:  none 

transfer  PC  Interface  to  VPH  memory  (longword  writes) 
command  word  $0020 

parameters:  longword  count  -  16-bit  the  number  of  32-blt  words 
to  transfer 

start  address  -  32-blt  the  starting 
address  In  VPH  (entered  LSW  first) 
data  type  -  16-blt  (ignored) 

output :  none 

transfer  VPH  memory  to  PC  Interface  (longword  reads) 
command  word  ••  $0021 

pereaeterst  longword  count  -  16-blt  the  number  of  32-blt  words 
to  transfer 
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start  address  -  32-blt  the  starting 
address  in  VPH  (entered  LSW  first) 
data  type  -  16-bit  (Ignored) 

output :  the  number  of  longwords  requested  in  longword  rount 

3.2.7  HSIO  Configuration 

Each  board  within  a  CPH  system  has  a  small  array  of  registers  vhose 
purpose  is  to  allow  downloading  of  configuration  data  and  to  provide  a 
mechanism  for  the  communication  of  control  information.  Some  of  these 
registers  are  not  registers  in  the  true  sense  of  the  word,  but  provide  various 
functionality  to  provide  the  required  range  of  special  communication  tasks 
required.  A  description  of  these  registers  as  they  must  appear,  for  example, 
on  the  cache  memory  boards  follows.  The  HSIO  is  the  information  highway  for 
this  communication. 

Across  the  HSIO  bus  are  also  control  and  status  information  about  the 
configuration  of  the  current  CPH  system.  This  status  information  consists  of 
the  number  of  cache  memory  banks,  number  of  CPH  processor  boards  installed, 
and  other  such  information.  That  status  will  be  contained  in  the  CPH 
processor  status  word  which  will  operate  as  shown  in  Figure  31. 

HSIO  LINEAR  ADDRESS  SPACE/ 10  SPACE 

The  HSIO  bus  can  access  a  24-blt  address  space.  This  "linear  address 
space"  will  be  used  to  access  resources  in  all  of  the  CPH  systems  the  lOP 
serves.  In  order  to  be  able  to  access  configuration  Information  on  any  board 
in  any  system,  an  additional  address  space,  referred  to  as  the  "10  Space,"  has 
been  added.  The  10  Space  will  simplify  system  mapping  and  access  to 
configuration/ communication  registers.  A  control  bit  on  the  HSIO  bus  will 
indicate  when  an  10  Space  access  is  to  occur,  as  opposed  to  an  access  to  the 
Linear  Address  Space.  This  line  will  be  an  active  low  line  which  when 
asserted  dictates  an  access  to  the  10  Space.  This  line  is  named  the  /HSIOMEH 
line. 

When  /BSIOMEM  is  asserted,  the  address  put  on  the  bus  will  have  the 
following  format: 


2  1  1 

3  109876543210 


s 

S 

s 

B 

B 

? 

B 

B 

R 

R 

R 

X...X 

2 

1 

0 

t 

3 

2 

1 

0 

2 

1 

i 

Bits  11  through  23  are  don’t  care: 

Bits  8,  9,  &  10  (S[2:0])  are  the  System  Address  bits.  These  bits  select 
one  of  eight  possible  systems. 

Bits  3  through  7  (B[4:0])  are  the  Board  Address  bits.  These  bits  select 
one  of  thirty-two  possible  boards  within  a  system. 
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Flgur*  31.  CFH  Status  Word 
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Bits  0,  1,  &  2  (R[2:0])  are  the  Register  Address  bits.  These  bits 
select  one  of  eight  possible  registers  on  a  given  board. 

Each  board  will  need  a  system  of  switches  and/or  junipers  to  set  the 
system  and  board  addresses  for  that  particular  board. 

On  the  backplane,  a  bit  similar  to  /HSIOMEM  exists.  This  Is  the  /CONFIG 
microprogram  bit  which  when  asserted  Indicates  that  the  address  on  Port  A/C  Is 
destined  for  the  configuration  registers  rather  than  the  general  address  space 
of  the  CFH  system.  Data  to  be  written  to  the  configuration  registers  will  be 
written  In  Port  C  and  data  read  from  the  registers  will  appear  on  Port  A.  The 
/WRCAr  and  /RDA  microprogram  bits  will  be  used  to  determine  a  processor 
configuration  write  and  read,  respectively. 

REGISTER  DESCRIPTION 

Each  of  the  registers  within  the  10  Space  on  a  particular  board  Is  a  16- 
hlt  register.  Since  all  data  paths  are  32-blt  paths,  the  convention  will  be 
adopted  of  using  the  least  significant  16  bits  of  a  given  path  when  accessing 
an  10  Space  register.  In  addition.  In  the  case  of  a  complex  (64-blt 
real /Imaginary)  path,  the  real  portion  of  the  path  will  be  utilized. 

The  upper  two  registers  are  16-hlt  mailbox  registers  which  are 
accessible  from  the  HSIO  bus  and  the  backplane.  The  register  located  at  the 
hoard  base  address  +  4  Is  accessible  from  the  HSIO  Bus  only.  Register  base 
address  +  5  Is  accessible  from  the  backplane  only.  Each  of  these  registers  Is 
read/write  from  Its  respective  buses. 

The  register  at  the  board  base  address  Is  a  read-only  location  which 
contains  ID  Information  for  that  board.  This  register  Is  accessible  from 
either  the  HSIO  or  the  backplane.  The  format  of  the  register  Is: 

Bits  0:3  -  a  4-blt  board  ID  code. 

Bits  4:7  -  a  4-blt  memory  size  code. 

Bits  8:11  -  a  4-blt  block  size  code. 

Bits  12:15  -  a  4-blt  read  latency  time  code. 

These  bits  may  be  hard-wired.  However,  lu  view  of  the  fact  that  the 
codes  have  not  yet  been  defined,  and  to  allow  for  future  re- definition,  these 
16  bits  will  be  set  with  jumpers. 

The  register  located  at  the  base  address  +  1  Is  loqiortant.  This 
register  Is  a  compound,  special-purpose  read/write  register.  Eight  bits  are 
semaphore  bits,  and  eight  bits  are  a  "mailbox"  register  for  passing  control 
Information  between  the  HSIO  and  the  backplane.  A  description  of  how  the 
semaphores  and  mailbox  must  work  follows. 

Bit  0  Is  a  system  Interrupt  bit.  This  bit  must  therefore  be  passed 
through  an  Inverting  hlgh-drlve  open-collector  driver  to  the  appropriate 
System  Interrupt  line  on  the  HSIO.  Again,  jumpers  will  be  used  for  routing 
this  bit  to  the  appropriate  System  Interrupt  line. 
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Bit  1  Is  defined  as  "/VALID."  This  bit  Is  active  low  to  Indicate  If  P/H 
(Bit  2)  Is  valid.  This  bit  Is  read/write  from  both  the  HSIO  and  backplane. 

Bits  2:4  of  this  register  are  for  semaphores  which  are  set  by  the 
backplane  and  cleared  by  the  HSIO.  When  a  write  to  this  register  from  the 
backplane  occurs,  a  zero  In  any  bit  position  causes  the  corresponding  bit  In 
the  register  to  remain  unchanged;  a  one  In  any  bit  position  causes  the 
corresponding  bit  In  the  register  to  be  set  (to  one).  Uhen  a  write  from  the 
HSIO  occurs,  a  zero  In  any  bit  position  causes  the  corresponding  bit  In  the 
register  to  remain  unchanged;  a  one  In  any  position  causes  the  corresponding 
bit  In  the  register  to  be  cleared  (set  to  zero).  A  read  from  either  bus 
simply  returns  the  state  of  the  three  bits.  Bit  2  Is  defined  as  "P/H"  and 
Indicates  control  of  the  cache  board.  If  Bit  2  Is  low,  the  HSIO  has  control 
of  the  board,  but  If  the  bit  Is  high,  the  processor  has  control  of  the  board. 
Bit  1  Is  used  to  determine  If  the  state  of  this  bit  Is  valid.  Bits  3:4  are 
undefined,  general  purpose  semaphores. 

Bits  5:7  of  this  register  behave  just  as  Bits  2:4,  except  that  they  set 
from  the  HSIO  and  clear  from  the  backplane.  All  three  of  these  bits  are 
undefined,  general  purpose  semaphores. 

Bits  8:15  of  this  register  are  to  form  a  mailbox  between  the  HSIO  and 
the  backplane.  That  Is,  these  eight  bits  are  read/write  from  either  bus. 
When  a  read  occurs,  the  bits  retrieved  reflect  the  most  recent  write  from  the 
other  bus.  A  write  from  one  bus  will  not  overwrite  the  most  recent  write  from 
the  other  hus.  This  behavior  Is  achieved  with  two  8-blt  registers  In  parallel 
being  oriented  In  opposite  directions.  An  HSIO  read  or  backplane  write 
accesses  one  register,  an  HSIO  write  or  backplane  read  accesses  the  other. 

An  Interesting  aspect  of  these  registers*  behavior  Is  that  access  from 
the  backplane  to  any  of  these  registers  Is  achieved  by  qxiallflcation  of  a  bank 
address  placed  on  the  backplane  with  the  /CONFIG  bit  asserted.  When  a  valid 
bank  address  Is  presented  during  a  READ  cycle,  only  the  least  significant 
board  at  offset  zero  responds  to  the  read  request.  During  a  WRITE,  however, 
the  data  presented  Is  written  to  ALL  boards  within  that  bank.  The  reason  for 
this  Is  that  the  processor  views  memory  as  banks  with  a  maximum  depth  of  256k 
-  It  has  no  concern  that  there  may  be  multiple  boards  within  a  bank.  The  lOP, 
on  the  other  hand,  has  no  conception  of  "banks"  of  memory  -  each  board  Is  a 
separate  entity,  regardless  of  what  bank  It  belongs  to,  or  Aether  it  Is 
configured  as  cache  or  Auxiliary.  This  means  that  any  "message"  to  be  passed 
from  the  lOP  to  the  processor  must  be  written  to  the  correct  board  (least 
significant,  offset  zero).  It  will  therefore  be  up  to  the  progranmer  to  keep 
track  of  such  details. 

The  registers  located  at  the  base  address  2  and  +  3  are  configuration 
registers.  These  registers  are  loaded  via  the  HSIO  bus  with  information  which 
assigns  each  of  the  blocks  on  the  cache  board  a  cache  and/or  Auxiliary  nwmory 
bank  address  and  offset  Into  the  block.  Another  bit  per  block  assigns  most  or 
least  significant  status,  and  another  bit  selects  the  board  as  cache  or 
Auxiliary  memory.  Bits  are  assigned  as  follows: 

Bits  0:3  -  Bank  Address 
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Bits 

4:7 

-  Offset  Block  0 

Bits 

8:11 

-  Offset  Block  1 

Bit 

12 

-  MSB/LSB  Block  0 

Bit 

13 

-  MSB/LSB  Block  1 

Bit 

14 

-  Aux/Cache 

Bit 

15 

-  Undefined 

3.2.8  Crossbar 

In  order  to  minimize  chip  cotmt  and  processor  board  space,  a  crossbar 
chip  study  was  started.  In  Deceinber  of  1989,  AliCC  formally  quoted  to  STC 
their  development  costs  for  the  ASIC  crossbar  design.  A  design  quote  by 
customer  through  netllst  was  $85,000  with  14  weeks  schedule.  A  design  quote 
by  customer  (STC)  at  AMCC  was  $95,000  with  14  weeks  schedule.  A  custom  4:1 
Mux  with  Input  enable  was  quoted  at  $10,000  with  4  weeks  delivery.  Production 
prices  for  up  to  25  prototypes  was  $750  per  piece  and  $504  In  quantities  of 
100-499.  They  specified  an  80  MHz  clock  In  a  301  PGA  configuration  using 
BICMOS.  Space  Tech  then  sought  out  ILSI  more  aggressively  for  their  more 
economical  ASIC  design. 

The  new  chip  In  cooperation  with  ILSI  was  developed  as  an  Innovative 
crossbar  switch  at  an  NRE  cost  of  $35,000  that  Is  particularly  well-suited  for 
high-speed,  multiprocessor,  mlcroprogrammable,  pipelined  environments.  It  Is 
now  described. 

This  crossbar  differs  from  others  currently  available  In  that  It  Is  both 
high  speed  (40  MHz)  and  has  a  large  number  of  ports  (12  by  14),  all  control 
lines  are  separately  accessible,  and  It  has  an  Internal  multlported, 
configurable  register  file. 

The  XB1210-40C  crossbar  switch  Is  an  ASIC  fabricated  with  1-mlcron  CMOS 
technology.  All  pins  use  standard  TTL  levels.  The  device  Is  packaged  In  a 
256-pln  PGA  and  supports  Control  Clock  rates  up  to  40  MHz.  It  supports  two- 
phase  operation  by  means  of  two  Independent  data  clocks  which  are  used  to 
clock  the  output  port  pipeline  registers. 

This  crossbar  has  10  dedicated  input  ports,  12  dedicated  output  ports 
and  2  bidirectional  ports.  Each  output  port  can  access  data  from  any  Input 
port.  All  ports  are  4-blts  wide  externally  and  all  Internal  data  paths  are  8- 
blts  wide.  Input  ports  have  a  4-bit  demultiplexing  latch  and  output  ports 
have  a  multiplexor  to  choose  least  significant  or  most  significant  bits  from 
the  pipeline.  This  device  Is  particularly  well  suited  to  architectures 
employing  the  BIT  Multiplier /ALU  chipset,  where  8  crossbar  chips  may  be 
paralleled  to  achieve  a  crossbar  system  that  Is  32  bits  wide  externally  and  64 
bits  wide  Internally. 

All  output  ports  are  pipelined  with  a  pair  of  parallel  registers  -  one 
for  the  first  phase  and  another  for  the  second  phase.  A  control  line  Is 
provided  for  each  output  port  to  select  data  from  either  register.  These 
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pipeline  registers  are  clocked  with  two  clocks  -  First  Phase  Clock  and  Second 
Phase  Clock.  The  Second  Phase  clock  may  be  tied  low  for  single  phase 
operation.  All  control  lines  are  selectively  pipelined  and  may  be  clocked 
using  the  Control  Clock  which  Is  also  used  to  clock  the  register  file. 

Since  all  control  lines  may  be  accessed  simultaneously,  the  entire 
crossbar  may  be  reconfigured  every  clock  cycle  as  opposed  to  requiring  many 
cycles  to  set  up  paths  as  In  crossbars  \diere  the  control  signals  are  bused 
together. 

The  most  unique  feature  of  the  XB1210-40C  Is  an  Internal  multlported, 
configurable  register  file.  This  register  file  Is  a  four  port  synchronous 
static  RAM  organized  as  64  words  by  8  bits.  It  can  also  be  used 
asynchronously  by  tying  the  Control  Clock  low.  Each  port  has  Its  own  address 
and  all  ports  may  be  used  simultaneously.  Each  register  file  port  may  be 
accessed  by  any  of  the  crossbar  Input  ports.  The  register  file  may  be 
configured  In  different  ways  -  as  normal  static  RAM,  as  8  pipeline  registers  8 
deep,  4  pipeline  registers  16  deep,  2  pipeline  registers  32  deep  or  as  a 
circular  buffer.  Figures  32  to  35  depict  shift  mode  1,  2,  and  3,  and  XBAR  to 
GPR  data  paths.  These  operating  modes,  non-plpellned  synchronous  and 
asynchronous,  and  pipelined  synchronous  are  described  later. 

The  crossbar  consists  of  four  major  components  -  Input  ports,  output 
ports,  multiplexers,  and  a  four  port  register  file.  All  Internal  data  paths 
are  8  bits  wide  while  all  I/O  ports  are  4  bits  wide.  Demultiplexing  latches 
are  provided  on  all  Input  ports  and  multiplexers  are  used  on  all  output  ports. 
This  architecture  provides  high  speed  and  compatibility  with  various 
processors. 

INPUT  PORTS 

The  crossbar  has  ten  dedicated  Input  ports  (Il_[0..3]  to  I10_[0..3})  and 
two  bidirectional  ports  (I011_[0..3]  and  I012_[0. .3] ) .  Each  Input  port  has  a 
4-blt  demultiplexing  latch  and  an  MSWEN  control  Input  associated  with  It.  The 
most  significant  4  bits  of  data  are  presented  to  the  input  port  \dille  MSNEN  is 
brought  high.  MSNEN  should  then  be  brought  low.  Finally,  the  least 
significant  four  bits  should  be  presented  to  the  Input  port  and  held.  This 
provides  the  8-blt  word  presented  to  the  Internal  bus. 

MULTIPLEXERS 

After  passing  through  the  Input  ports,  data  Is  passed  onto  an  Internal 
bus.  This  bus  Is  112  bits  wide  -  8  bits  for  each  Input  port  and  8  bits  for 
each  of  two  register  file  read  ports.  Any  8-blt  path  of  this  bus  may  be 
selected  by  the  multiplexers  as  the  data  source  for  the  fourteen  output  ports 
or  two  register  file  write  ports. 
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Each  output  port  has  four  select  lines  SELn[0..3]  vhere  n  Is  the  port 
nunOier.  The  value  placed  on  these  Inputs  determines  the  source  of  the  data  to 
be  sent  to  the  output  port  registers.  As  an  example,  placing  a  hex  value  of 
"S”  on  any  set  of  SEL  Inputs  will  select  input  port  5  as  the  data  source.  Zn 
addition,  a  hex  value  of  "F"  will  disable  the  output  port  and  a  hex  value  of 
"0"  will  select  the  output  port  register  as  the  data  source.  This  will  cause 
the  ports*  registers  to  hold  their  current  state.  The  ports*  registers  will 
also  hold  their  state  when  the  output  is  disabled  with  an  "F**.  The 
multiplexer  select  Inputs  for  the  register  file  write  ports.  (SELA[0..3]  and 
SELB[0..3])  are  similar  to  the  ones  for  the  output  ports;  however,  a  hex  value 
of  "O”  will  send  all  zeros  to  the  register  file  and  a  hex  value  of  "F**  will 
send  all  ones. 

OUTPUT  PORTS 

Each  output  port  (01_[0..3]  to  014_[0..3])  and  each  1/0  port 
(1011_[0..3]  and  1012_(0..3])  have  two  multiplexers  and  two  8-blt  registers. 
The  operation  of  the  first  multiplexer  is  described  above  and  is  used  to 
select  the  source  of  data  presented  to  the  output  registers.  These  registers 
are  clocked  by  separate,  anti-phase  clocks.  The  phase  1  register  is  clocked 
by  the  low-to-high  transition  of  CLRl,  and  the  phase  2  register  is  similarly 
clocked  by  CLR2.  The  outputs  from  these  registers  are  then  input  to  the 
second  multiplexer. 

The  second  multiplexer  has  two  control  lines,  PSEL  and  MSWSEL,  ^Ich  are 
used  to  select  4  bits  for  the  output  buffer.  A  low  level  on  PSEL  selects  data 
from  the  phase  1  register  idiile  a  high  level  selects  data  from  the  phase  2 
register. 

The  MSWSEL  input  selects  between  the  most  and  least  significant  4-blt 
nibbles.  A  low  level  on  MSWSEL  selects  the  A  least  significant  bits  to  be 
output . 

REGISTER  FILE 

The  register  file  Is  a  four  port  synchronous  static  RAM  memory  organized 
as  an  8  by  8  array  of  8-blt  registers.  These  registers  are  clocked  by  the 
rising  edge  of  CLK3.  The  register  file  has  two  read  ports  (RPA  AMD  RPB)  and 
two  write  ports  (WFA  and  WPB).  Each  port  has  its  own  address  and  all  ports 
may  be  used  simultaneously.  Writing  to  the  same  location  from  both  write 
ports  simultaneously  Is  allowed.  Whenever  this  happens,  the  data  from  RPA  is 
used. 


The  write  address  Inputs  are  WRA_[0..5]  and  WRB_[0..5].  Each  write  port 
also  has  an  active  low  enable,  /WRENA  or  /WRENB.  The  read  address  inputs  are 
RDA_[0..5]  and  RDB_[0..5].  The  data  read  from  the  register  file  may  be 
accessed  by  any  output  port  or  be  written  back  Into  the  register  file.  A  hex 
value  of  **D**  placed  on  any  output  port*s  SEL  select  lines  will  select  RPA  and 
a  value  of  **E**  will  select  RPB. 

REGISTER  FILE  SHIFT  MODES 

Inputs  SMI  and  SMO  are  used  to  configure  the  register  file  as  a  shift 
register.  When  both  of  these  Inputs  are  low,  the  register  file  functions  like 
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a  normal  static  RAM.  When  SMO  la  brought  high  vhlle  SMI  remains  low,  each  row 
of  the  register  file  becomes  an  eight  deep  shift  register.  Writing  to  the 
first  register  of  each  row  causes  the  shift.  The  seven  remaining  registers  of 
each  row  will  be  written  to  with  the  data  from  the  preceding  register.  The 
old  data  In  the  last  register  Is  lost  forever.  Writing  to  a  register  other 
than  the  first  register  only  updates  that  specific  register.  Reading  never 
modifies  any  data. 

Bringing  SMI  high  while  leaving  SMO  low  links  pairs  of  rows  to  give  a 
configuration  of  four  shift  registers,  each  16  registers  deep.  Bringing  both 
SMI  and  SMO  high  links  four  rows  together  yielding  two  shift  registers,  each 
32  registers  deep. 

OPERATING  MODES 

The  crossbar  has  three  possible  modes  of  operation:  non-plpellned 
synchronous,  non-plpellned  asynchronous,  and  pipelined  synchronous.  The  MODE 
Input  selects  idiether  certain  other  Inputs  pass  through  Input  pipeline 
registers,  or  If  these  registers  are  bypassed.  The  affected  Inputs  are: 

SELx[0..3],  WRA_[0..5],  WRB_[0..5],  RDA_[0..5],  RDB_[0..5],  PSELx, 
SELA_[0..3],  SELB_[0..3],  /WRENA,  /WREMB,  SHI,  AMD  SMO.  Inputs  which  are  not 
affected  are:  Ix[0..3],  MSWEMx,  and  MSWSELx. 

A  low  level  on  MODE  causes  all  Inputs  to  bypass  the  Input  pipeline 
registers.  With  CLR3  left  rumilng,  non-plpellned  synchronous  mode  operation 
Is  achieved.  This  Is  the  normal  mode  of  operation  and  no  special 
considerations  are  Involved. 

If  CLK3  Is  tied  low  \dille  MODE  Is  held  low,  non-plpellned  asynchronous 
operation  Is  Invoked.  In  this  mode,  the  register  file  registers  are  clocked 
with  the  rising  edge  of  /WRENA  or  /WRENB.  Asynchronous  register  file  writes 
can  therefore  be  accomplished  in  this  mode.  Operation  of  the  Input  ports, 
output  ports,  and  multiplexers  Is  unaffected  by  the  absence  of  CLK3. 

If  MODE  Is  brought  high,  pipelined  synchronous  mode  operation  Is 
determined  and  CLK3  must  be  left  running.  This  Is  because  CLK3  Is  used  to 
clock  the  Input  pipeline  registers.  The  main  consideration  In  this  mode  of 
operation  Is  the  affected  Inputs  must  be  presented  to  the  crossbar  one  CLK3 
cycle  sooner,  and  slightly  different  set-up  and  hold  times  may  be  Involved. 

A  number  of  Important  control  signals  are  listed  next  In  Figure  36. 
Register  file  and  port  control  follow  In  Figure  37.  Then,  timing  charts  for 
the  mode  0  operations  can  be  found  In  subsequent  Figures  38  through  43.  These 
data  sheets  formed  the  specifications  for  contracting  the  fabrication  effort 
out  to  ILSI  In  Colorado  Springs.  Testing  of  the  crossbars  was  accoi:q>ll8hed  at 
ILSI  and  later  at  Space  Tech.  The  same  test  vectors  by  ILSI  were  on  our 
emulyzer  to  verify  ILSI  tests.  Those  vectors  can  be  found  In  the  ILSI  manual 
for  the  crossbars. 
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CLK1 

Active  high  clock  for  phase  one 
output  port  registers. 

CLK2 

Active  high  clock  for  phase  two 
output  port  registers. 

CLK3 

Active  high  clock  for  register  file 
and  control  input  pipeline  registers. 

MODE 

Bypasses  control  input  pipeline 
registers  when  low. 

1140.. 3]  to 

11040..  3] 

Data  input  ports  to  the  crossbar 
and  register  file. 

MSWEN1  to 
MSWEN12 

Controls  input  port  demultiplexing  latches. 

Latches  are  transporent  when  high. 

SELI4O..3]  to 
SELI44O..3] 

Select  inputs  for  output  port  registers. 

PSFL1  to 

PSEL14 

Selects  phase  one  register  for  output 
when  low  and  phase  two  when  high. 

MSWSEL1  to 
MSWSEL14 

Multiplexer  for  output  ports.  Selects 
most  significant  four  bits  when  high. 

0140.. 3] 
01040.. 3' 

013_'0..3' 

01440.. 3; 

to 

Data  output  ports  from  crossbor  ond 
register  Hie. 

101 140. 3 
101 240.. 3 

Bidirectional  data  ports. 

SELA[0..3] 

Select  inputs  for  register  file  write  port  A. 

SELB[0..3] 

Select  inputs  for  register  file  write  port  B. 

WRENS 

Active  low  write  enable  for  register  file  port  A. 

wRenB 

Active  low  write  enoble  for  register  file  port  B. 

WRA[0..5] 

Address  inputs  for  register  file  write  port  A. 

WRB[0..5] 

Address  inputs  for  register  file  write  port  B. 

RDA[0..5] 

Address  inputs  for  register  file  read  port  A. 

RDB[0..5] 

Address  inputs  for  register  file  read  port  B. 

SMODEO 

SM0DE1 

Shift  mode  control  inputs  for  register  file. 

flsoxa  96.  CoBtrol  Signals 
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Output  Port  Control 
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Input  Port  to  Output  Port 

Transaction  -for  CLKl 
nODE=l  PSEL^e 


*sut  *SU2  *tC2 
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PfWVIETER 

DESCRIPTION 

niN 

nflx 

UNITS 

*SUl 

Input  04t<  to  nSlCN  LCM  Set-up 

ns 

*rt)I 

Input  Hold  froa  HSUEN  LOU 

ns 

»SU2 

Input  D«t<  to  aKI  HIGH  S«t-up 
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*H02 

Input  Hold  Froa  aKI  HIGH 

ns 

*SIM 

Sat-up  Froa  HSUEH  LOU  to  aKI  HIGH 

ns 

*SU3 

SEL  Inputs  to  aK3  HIGH  Sot-up 

tH03 

SEL  Irvuts  Hold  Froa  aK3  HIGH 

BB 

‘POJ 

OKI  HIGH  to  Output  D«t«  Usild 

ns 

*  rtH 

Output  D<t«  Hold  Froa  aKI  HIGH 
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flgur*  38.  Tlalas  Ch«rCa 
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Input  Port  to  Output  Port 
Transaction  -for  CLK2 
nODE=l  PSEL=1 
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Flgara  39 •  Tlalag  Charts 
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OUTPUT  PORT  CONTROL 
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flgim  41.  Tlaiiig  Qiarta 
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REGISTER  FILE  RERD 
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flgnr*  43.  Tialiis  Charta 
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3. 2. 8.1  Tutiag  ths  Crosabars 

Characterization  tests  were  perfonaed  by  ILSl  at  ILSI  before  shipment  to 
Space  Tech.  Those  test  sequences  and  vectors  are  listed  In  the  ILSl 
specifications  manual  under  separate  cover.  Verification  tests  vere  performed 
at  Space  Tech  with  a  Hl-Level  Bmulyzer  connected  to  the  Input  and  output  ports 
of  each  device.  The  same  vectors  were  used  at  Space  Tech  as  vere  used  at  ILSl 
to  confirm  the  operation  of  each  device.  Of  the  ten  shipped  to  us.  only  one 
failed  and  was  dead  on  arrival.  It  was  replaced  by  ILSl  after  they  confirmed 
our  results.  The  vectors  used  by  Space  Tech  and  ILSl  set  up  Is  and  Os  In 
adjacent  bits  alternating  and  repeating  so  that  crosstalk  could  be  discovered. 
Clocks  were  adjusted  from  1  to  20  MHz  and  the  chips  passed  at  all  clocks 
except  20  MHz  In  some  modes.  Those  modes  are  not  used  In  the  CPH  so  they  were 
Important.  The  Important  modes  were  mode  0  modes  and  all  passed  these  mode 
tests  at  all  clock  speeds. 

The  typical  test  setup  of  vectors  used  are  shown  In  the  following  sheet 
from  the  engineer’s  notebook  in  Figure  44.  Here,  we  can  see  that  read  and 
write  ports  A  and  B  were  activated  with  the  several  Input  data  control  lines 
and  output  data  control  lines.  The  testing  took  approximately  4  hours  per 
device  since  12x14  combinations  of  configurations  vere  to  be  tested  by 
numerous  test  vectors.  The  Space  Tech  test  fixture  Is  shown  in  the  next 
drawing  as  Figure  45.  The  test  fixture  uses  the  pinout  assignments  for  the 
crossbar  chip  as  shown  In  Figure  46.  A  6D  Mupac  VMS  board  was  used  with  PALs 
and  registers  to  clock  test  signals  and  controls  onto  the  crossbar  under  test. 

A  PAL  function  was  created  for  the  test  Jig,  XBARIM.POS,  to  Input  data 
into  the  I/O  ports  In  a  pipelined,  s3rachronous  manner.  The  test  vectors  of 
mode  0  could  be  used  in  testing  mode  1  with  the  following  modifications.  The 
write  pulse  had  to  be  shifted  from  the  least  significant  vectors  to  the  most 
significant  positions.  The  write  pulse  had  to  be  widened  by  several 
nanoseconds  (accomplished  by  modifying  ZBAR2.PDS  to  Include  an  additional 
Input,  namely  async).  The  input  data  to  the  I/O  ports  bad  to  be  shifted  one 
cycle  sooner  to  offset  the  additional  pipelining  the  PALs  now  present.  And 
the  SELx  data  of  any  F’s  (to  high  Impedance  output  PORTk)  had  to  be  shifted 
one  cycle  sooner  also  (due  to  mode  1  Internal  pipelining  of  SELx  data). 

With  the  modifications  described  and  one  new  set  of  vectors  to  test  all 
of  the  Internal  pipelining,  six  sots  of  vectors  were  used  to  test  mode  1 
operation.  After  creating  output  reference  files  to  compare  XBAR  outputs  to, 
testing  of  the  XBAR  chips  commenced  In  earnest. 

While  testing  the  XBAR,  some  sets  of  vectors  ran  better  If  a  different 
amount  of  delay  was  used  between  SLK3  and  PGCLK.  Thus,  a  "gate  delay  line" 
was  Introduced  to  the  Jig  to  allow  selective  clock  skewing.  The  delays  needed 
for  optimum  testing  are  listed  In  the  Engineer’s  Hotebook  which  gives  the 
complete  testing  procedure. 

The  result  of  testing  was  that  9  of  10  chips  ran  all  11  sets  of  test 
vectors  with  no  erroneous  output.  The  tenth  chip,  however,  did  not 
successfully  run  even  one  set  of  vectors.  Several  clock  speeds  and  skews  were 
tried  and  didn’t  get  any  Improvement.  The  chip  was  then  packaged  up  and  sent 
back  to  ILSl  for  replacement. 
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AAAAAAAAJkAAJkAAAlAJL&AJLAAAAAAJkAJkJkAAXAXJkAAAXXJLAJlAAAJkAXAAXXAAAAAJLXAAAJLAXJlA 


UORO  FORMAT: 


MSN 

HEX1/HEX2/HEX3/ 


LSN 

/HEX34/HEX3S/HEX36 


NIBBLE  LEGEND: 


HEXl  >  /UCA.  XXX,  URA-S,  MRA-4 
HEX2  <  URA-3,  URA-2,  URA-l,  URA-0 
HEX3  >  SELA-3,  SELA-2,  SELA-l,  SELA-0 

HEX4  s  /UCB,  XXX.  URB-S.  URB-4 
HEXS  I  URB'3,  URB-2,  URB-l.  NRB-O 
HEX6  *  SELB-3,  SELB-2,  SELB-1,  SELB-0 


HEX7  *  XXX,  XXX.  ROA-S.  ROA-4 
HEX8  =  ROA-3,  ROA-2,  RDA-1,  ROA-0 


HEX9  s  XXX,  XXX 
HEXIO  =  ROB-3,  R( 


RDB-S.  RDB-4 
B-2,  ROB-1,  ROB-0 


--  MRITE  PORT  A  CONTROL 

-  MRITE  PORT  B  CONTROL 

-  READ  PORT  A  CONTROL 

-  READ  PORT  B  CONTROL 


- 

28  - 

HEXll 

S 

SELl-3. 

SELl-2,  SELl-1 

SELl-0 

■■  4 

1 

29  - 

HEX12 

SEL2-3, 

SEL2-2,  SEL2-1 

SEL2-0 

1 

1 

30  - 

HEX13 

3 

SEL3-3, 

SEL3-2,  SEL3-1 

SEL3-0 

1 

1 

31  - 

HEX  14 

SEL4-3, 

SEL4-2,  SEL4-1 

SEL4-0 

1 

1 

32  - 

HEXIS 

3 

SEL5-3. 

SEL5-2,  SEL5-1 

SEL5-0 

1 

33  - 

HEX16 

SEL6-3, 

SEL6-2,  SEL6-1 

SEL6-0 

1 

1 

34  - 

HEX17 

3 

SEL7-3, 

SEL7-2,  SEL7-1 

SEL7-0 

—  OUTPUT  DATA 

35  - 

HEX18 

1 

SEL8-3, 

5EL8-2,  SEL8-1 

SEL8-0 

f 

1 

36  - 

HEX19 

3 

SEL9-3, 

SEL9-2,  SEL9-1 

SEL9-0 

1 

1 

37  - 

HEX20 

3 

SELlO-3, 

SELlO-2,  SELlO-1 

5EL10-0 

1 

1 

38  - 

HEX21 

3 

SELll-3, 

SELU-2,  SELll-1 

SELll-0 

1 

1 

39  - 

HEX22 

3 

SEL12-3, 

SEL12-2,  SEL12-1 

SEL12-0 

1 

1 

40  - 

HEX23 

S 

SEL13-3, 

SEL13-2,  SEL13-1 

SEL13-0 

1 

1 

41  - 

42  - 

HEX24 

* 

SEL14-3, 

SEL14-2,  SEL14-1 

SEL14-0 

1 

43  - 

44  - 

HEX25 

s 

11-3, 

11-2,  Il-l, 

ii-o  ■ 

1 

45  - 

HEX26 

3 

12-3, 

12-2,  12-1, 

12-0 

I 

1 

46  - 

HEX27 

3 

13-3, 

13-2,  13-1, 

13-0 

1 

1 

47  - 

HEX28 

3 

14-3, 

14-2,  14-1, 

14-0 

1 

1 

48  - 

HEX29 

3 

15-3, 

15-2,  15-1, 

15-0 

1 

1 

49  - 

HEX30 

3 

16-3, 

16-2,  16-1, 

16-0 

}-  INPUT  DATA 

SO  - 

HEX31 

3 

17-3, 

17-2,  17-1, 

17-0 

1 

SI  - 

HEX32 

8 

18-3, 

18-2,  18-1, 

18-0 

1 

1 

S2  - 

HEX33 

8 

19-3, 

19-2,  19-1, 

19-0 

1 

1 

53  - 

HEX34 

3 

110-3, 

110-2,  IlO-l,  IlO-O 

1 

I 

54  - 

HEX3S 

3 

1011-3, 

1011-2,  lOll-l,  lOll-O  »a 

1 

1 

55  - 

HEX36 

3 

1012-3, 

1012-2,  1012-1,  1012-0  n 

1 

flgur*  44.  Imln— r*«  lotsboak  ShMt 
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>  > 


1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  1920 

A  I  12  I  -0  I  -l  I  12  I  -0  I  -1  I  -2  I  -2  I  -3  I  -2  I  -3  I  *2  I  -3  I  3  3  I  -0  I  -»  I  -3  1  -2  I  -3  I  * 

^  «WtN  littig  stllg  PSCl  sen  SELI  sen  01  01  02  02  Stl.2  sae  lewtll  PSEt.  UL3  SEL3  14  04  04  ^ 

p  -2  -3  -2  12  -2  -3  -3  -0  -1  -0  -1  -0  -1  -3  -2  -3  -2  -2  -0  -I  p 

^  1012  1012  sen?  iswai  ii  n  seli  oi  oi  as  as  sel?  see?  i3  03  q3  sel3  u  04  04  O 


21 22 
22  232 

^  I2J  Q 

^  Hj  2 

2  22  jj 
jjj  2  22  2 
m  2jl 

jj  2  221 _ 


TOP  VIEW 
256  PIN  PGA 


^  CROSSBAR 


1  23456789  10  11  121314  15  1617  1819  20 


flgan  46.  Cro*ab«r  Pliwat 
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>  > 


3.2.9  CFH  meroasquanesr 


As  with  many  of  the  other  "glue  logic"  functions,  a  microprogram 
sequencer  chip  fast  enough  for  the  EVA  architecture  was  not  available  In  1990. 
A  sequencer  that  can  also  support  relative  addressing  and  Interrupts  was 
required.  Several  are  available  now  but  they  remain  too  slow.  Available 
sequencers  that  can  handle  the  high  speed  don't  support  Interrupts  or  the 
necessary  addressing  modes.  One  solution  was  to  build  the  sequencer  out  of 
high  speed  PALS  and  logic  chips.  An  architecture  that  could  be  built  from 
available  parts  was  designed.  The  problem  with  this  approach  Is  that  over  50 
chips  are  required.  A  few  components  could  be  added  to  one  of  the  simple 
sequencer  chips  to  support  the  required  addressing  modes.  This  would  reduce 
the  part  count  but  the  combined  delay  would  be  too  great  to  meet  the  high 
speed  requirement.  Fortunately,  IDT  developed  a  suitable  part  by  1991. 

The  CPH  Microprogram  Sequencer  (CPH-MS)  Is  designed  to  perform  Its 
function  In  a  50  nsec  maximum  cycle  time.  Although  the  timing  analysis  Is  not 
complete,  a  preliminary  analysis  of  the  critical  timing  paths,  those  paths 
^Ich  pass  through  the  slowest  and/or  greatest  nuiid>er  of  components  seem  to 
meet  the  timing  criteria.  A  microinstruction  set  that  has  been  selected  Is: 


INITIALIZATION 


Load  Loop  Counter 

16-blt  count 

Load  Stack  Pointer 

10-blt  address 

Load  Subroutine  RAM  Pointer 

10-blt  address 

Load  Subroutine  RAM 

16-blt  data 

IMMEDIATE 

Jiunp  Immediate 

16-blt  address 

Jump  Immediate  Conditional 

16-blt  address 

Loop  Immediate 

16-blt  address 

RELATIVE 

Jump  Relative 

16-blt  relative 

address 

Jump  Relative  Conditional 

16-blt  relative 

address 

Loop  Relative 

16-blt  relative 

address 

INDEXED 

Call 

10-blt  Index 

Call  Conditional 

10-blt  Index 

INTERRUPTS 

Set  Interrupt  Mask 

8-blt  data 

Reset  Interrupt 

8-blt  data 

OTHER 

No  Operation 

no  data 

Return 

no  data 

Return  Conditional 

no  data 

Push 

no  data 

Pop 

no  data 

The  method  of  using  Indexed  subroutine  calls  allows  each  software  module 
to  be  assembled,  linked,  and  located  at  a  base  address  of  OOOOh.  The  modules 
may  then  be  loaded  Into  program  memory  and  called  by  their  Index  number.  Each 
call  accesses  the  subroutine  RAM  by  Index  number,  and  the  subroutine  RAM  then 
loads  the  program  counter  with  the  address  corresponding  to  the  physical 
location  of  the  module.  Care  must  be  taken  when  programming  the  modules  not 
to  use  Immediate  Instructions.  Implementing  the  Interrupt  vector  table  Into 
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the  same  RAM  as  the  subroutine  Indices,  and  separate  from  the  stack  RAM, 
provides  for  the  simultaneous  access  of  both  banks  of  RAM  during  a  Call 
Instruction.  This  allows  the  present  address  In  the  program  counter  to  be 
pushed  onto  stack  at  the  same  time  that  the  new  ’call’  address  Is  presented  to 
the  program  counter  for  a  50  nsec  single  cycle  Instruction.  By  placing  the 
Interrupt  table  In  the  subroutine  RAM,  the  same  single  cycle  Instruction  may 
push  the  program  counter  onto  the  stack  upon  detection  of  a  hardware 
Interrupt.  This  also  simplifies  hardware  design,  since  the  latches  necessary 
to  hold  the  RAM  address  \dille  loading  In  data  need  not  be  present  for  the 
stack  RAM. 

The  following  features  are  supported: 

A  2-to-l  MUX  allows  the  Immedlate/relatlve  address  to  come  from  a  source 
external  to  the  microsequencer.  The  stack  and  subroutine  RAM  Is  4kxl6  In 
size.  An  additional  output  MUX  and  a  trl-state  buffer  were  added  to  create 
two  separate  buses,  one  dedicated  to  the  microsequencer  and  the  second  drives 
the  external  RAM.  This  helps  guarantee  that  the  tight  timing  requirements  of 
the  microsequencer  won’t  be  compromised. 

Several  restrictions  on  Instruction  sequences  have  been  eliminated  by 
designing  the  stack  pointer  out  of  PALs  rather  than  discrete  up/down  counters. 
Prior  to  the  change,  CALL  and  PUSH  type  Instructions  which  Increment  the  stack 
after  writing  to  It  conflicted  with  RET  instructions  which  Increment  the  stack 
before  reading  from  It.  The  solution  required  that  a  40  MHz  clock  be  brought 
In  and  logic  added  to  compare  the  previous  Instruction  to  Its  successor  and 
decide  at  each  20  MHz  clock  whether  or  not  to  Increment  or  decrement  for  the 
CALL,  PUSH,  and  RET  type  instructions.  For  POP,  LS,  TWBI,  and  TWBR 
instructions  where  the  data  is  merely  discarded  from  the  stack,  this  Is  done 
using  the  40  MHz  clock  at  mid-lnstructlon. 

The  full  Instruction  set  now  follows.  Since  the  Instructions  are 
’mlcrocoded’  using  PALs,  and  the  PALs  have  many  product  terms  remaining, 
additional  Instructions  may  have  to  be  added  as  required  without  changing  any 
hardware . 

NOTE:  In  the  following  description  /CHTO  refers  to  the  loop  counter’s 
tezmlnal  count  which  goes  low  upon  reaching  zero,  and  /COND  Is  a  condition  bit 
which  Indicates  a  true  condition  when  low. 


INSTRUCTION  SET 


NOP 

No  Operation 

LOLC 

Load  Loop  Counter 

LOSP 

Load  Stack  Pointer 

LDSRP 

Load  Subroutine  RAM  Pointer 

LOSUBR 

Load  Subroutine  RAM 

SIM 

Set  Interrupt  Mask 

RIM 

Reset  Interrupt  Mask 

RINT 

Reset  Interrupt 

JI 

Jump  Immediate 

JIC 

Jump  Ismedlate  Conditional 

JR 

Jump  Relative 
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JRC 

Jump  Relative  Conditionally 

LI 

Loop  Immediate 

LR 

Loop  Relative 

LS 

Loop  Stack 

TWBI 

Three-Way  Branch  Immediate 

TWBR 

Three-Way  Branch  Relative 

CALL 

Call 

CALLC 

Call  Conditional 

RET 

Return 

RETC 

Return  Conditional 

PUSH 

Push 

PUSHC 

Push  Conditionally 

PLDLC 

Push  and  Load  Loop  Counter 

PLDLCC 

Push  and  Load  Loop  Counter  Conditionally 

POP 

Pop  (Discard  Top  of  Stack) 

POPC 

Pop  Conditionally  (Discard  Top  of  Stack) 

El 

Enable  Interrupts 

DI 

Disable  Interrupts 

Vilhezt  a  data  field  of  less  than  16-blts  is  specified,  the  data  is  to  be 
right  justified  into  the  lowest  bits  possible.  For  example,  an  8-blt  number 
A5h  will  become  00A5h  in  the  16-blt  data  field. 


Mnemonic  OpCode  Data 


Description 


NOP  07Fh  —  Does  nothing  but  consume  time. 

The  next  address  is  the  program 
counter  +  1. 


LDLC  07Eh  16-blts  Load  loop  counter  with  the  data 

appearing  in  the  data  field. 

The  next  address  is  the  program 
counter  +1. 


LDSP  07Dh  12-blts  Load  stack  pointer  with  the  data 

appearing  in  the  data  field. 

The  next  address  is  the  program 
counter  +  1. 


LDSRP  07Ch  16-blts  Load  subroutine  RAM  address 

pointer  with  the  data  appearing 
in  the  data  field.  The  next 
address  is  the  program  counter 
+  1. 


LDSUBR  07Bh  16-blts  Write  the  data  to  subroutine/ 

Interrupt  RAM  location  pointed 
to  by  the  subroutine  address 
pointer  last  loaded  using  the 
LDSRP  instruction.  The  next 
address  is  the  program  counter 
+  1. 


SIM  07Ah  8-blt8  Set  Interrupt  masks  indicated  in 
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RIM  079h 


RINT  078h 


JI  077h 

JIC  076h 

JR  075h 


the  data  field.  Each  bit  In  the 
data  field  corresponds  to  one 
interrupt.  The  least  significant 
bit  corresponds  to  Interrupt  0 
(/INTO)  which  has  the  lowest 
priority t  up  through  the  most 
significant  bit  for  interrupt  7 
(/INT7)  which  has  the  highest 
priority.  Wherever  a  bit  is  set 
to  one  in  the  data  field  the 
corresponding  mask  will  be  set. 
The  next  address  is  the  program 
counter  -t-  1. 

8-blts  Resets  Interrupt  masks  indicated 
in  the  data  field.  Each  bit  in 
the  data  field  corresponds  to  one 
Interrupt.  The  least  significant 
bit  corresponds  to  Interrupt  0 
(/INTO)  \diich  has  the  lowest 
priority,  up  through  the  most 
significant  bit  for  interrupt  7 
(/INT7)  \dilch  has  the  highest 
priority.  Wherever  a  bit  is  set 
to  one  in  the  data  field  the 
corresponding  mask  will  be  reset. 
The  next  address  is  the  program 
counter  +1. 


8-bits  Resets  the  interrupts  indicated 
in  the  data  field.  Each  bit  in 
the  data  field  corresponds  to  one 
Interrupt.  The  least  significant 
bit  corresponds  to  interrupt  0 
(/INTO)  which  has  the  lowest 
priority,  up  through  the  most 
significant  bit  for  interrupt  7 
(/INT7)  which  has  the  highest 
priority.  Wherever  a  bit  is  set 
to  one  in  the  data  field  the 
corresponding  Interrupt  will  be 
reset.  The  next  address  is  the 
program  counter  +1. 

16-blts  Jump  to  the  address  specified  in 
the  data  field. 

16-blts  Jump  to  the  address  specified  in 
the  data  field  only  if  the  /COND 
signal  is  low,  else  the  next 
address  is  the  program  counter 
+  1. 

16-blt8  Jump  to  the  address  created  by 
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adding  the  program  counter  to  the 
data  field. 

•IRC  07 Ah  16-blt8  Jump  to  the  address  created  by 

adding  the  program  counter  to  the 
data  field  only  If  the  /COND 
signal  Is  low,  else  the  next 
address  Is  the  program  counter 
+  1. 

LI  073h  16-blt8  If  /CNTO  Is  high.  Indicating  that 

the  loop  counter  has  not  yet 
reached  0,  then  jump  to  the 
address  specified  In  the  data 
field. 

If  /CNTO  is  low  the  next  address 
is  the  program  counter  +  1. 

LR  072h  16-bits  If  /CNTO  is  high.  Indicating  that 

the  loop  counter  has  not  yet 
reached  0,  then  jump  to  the 
address  created  by  adding  the 
program  counter  to  the  data  field. 

If  /CNTO  is  low  the  next  address 
is  the  program  counter  +  1. 

LS  071h  --  If  /ONTO  is  high,  indicating  that 

the  loop  counter  has  not  yet 
reached  0,  then  jump  to  the 
address  located  on  the  top  of  the 
stack.  This  address  is  to  remain 
on  the  top  of  the  stack  after  the 
jump. 

If  /CNTO  is  low,  then  the  jump 
address  on  the  top  of  the  stack  is 
discarded  and  the  next  address  is 
the  program  counter  +  1 . 

TV®1  070h  16-blts  If  /CNTO  is  high.  Indicating  that 

the  loop  counter  has  not  yet 
reached  0,  and  /COND  is  high 
Indicating  a  false  condition,  then 
jump  to  the  address  located  on  the 
top  of  the  stack. 

If  /CNTO  is  low  and  /COND  is  high 
then  jump  to  the  address  specified 
in  the  data  field.  The  address  on 
the  top  of  the  stack  is  discarded. 

If  /COND  is  low  then  the  next 
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address  Is  the  program  counter  1 
and  the  address  appearing  on  top 
of  the  stack  Is  discarded. 

TWBR  06Fh  16-bits  If  /CNTO  is  high,  indicating  that 

the  loop  counter  has  not  yet 
reached  0,  and  /COND  is  high 
indicating  a  false  condition,  then 
jump  to  the  address  located  on  the 
top  of  the  stack. 

If  /CHTO  is  low  and  /COND  is  high 
then  Jump  to  the  address  created 
by  adding  the  program  counter  to 
the  data  field.  The  address  on 
the  top  of  the  stack  is  discarded. 

If  /COND  is  low  then  the  next 
address  is  the  program  counter  -t-  1 
and  the  address  appearing  on  top 
of  the  stack  is  discarded. 


CALL  06Eh  12-blts  The  current  program  counter  is 

incremented  and  stored  onto  the 
top  of  the  stack.  The  program 
then  jumps  to  the  address 
appeariivg  in  the  subroutine/ 
interrupt  RAM  at  the  SUBRAM 
address  given  in  the  data  field. 


CALLC  06Dh  16-bits  If  /COND  is  low  then  the  current 

program  counter  is  incremented  and 
stored  onto  the  top  of  the  stack. 
The  program  then  jumps  to  the 
address  appearing  in  the 
subroutine/interrupt  RAM  at  the 
SUBRAM  address  given  in  the  data 
field. 

If  /COND  is  high  then  the  next 
address  is  the  program  counter 
+  1. 


RET  06Ch 

RETC  06Bh 

PUSH  06Ah 


Jump  to  the  address  appearing  on 
the  top  of  the  stack. 

If  /COND  is  low  then  jump  to  the 
address  appearing  on  the  top  of  the 
stack. 

If  /COND  is  high  then  the  next 
address  is  the  program  counter  *  1. 

Store  the  program  counter  -t-  1  on 
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the  top  of  the  stack.  The  next 
address  Is  the  program  counter  +  1. 

PUSHC  069h  —  If  /COND  Is  low  then  store  the 

program  counter  +  1  on  the  top  of 
the  stack.  The  next  address  is  the 
program  counter  -t-  1. 

FLDLC  068h  16-blts  Store  the  program  counter  +  1  on 

the  top  of  the  stack.  Load  loop 
counter  with  the  data  appearing  In 
the  data  field.  The  next  address 
Is  the  program  counter  -i-  1. 


FLDLCC  067h  16-blts  Store  the  program  counter  -t-  1  on 

the  top  of  the  stack.  NOTE:  The 
preceding  push  was  not  conditional. 
If  /COND  Is  low,  then  load  the  loop 
counter  with  the  data  appearing  In 
the  data  field.  The  next  address 
Is  the  program  counter  1. 


FOF  066h 


FOFC  065h 


El  064h 


DI  063h 


Discard  the  data  appearing  on  the 
top  of  the  stack.  The  next 
Instruction  Is  the  program  counter 
+  1. 

If  /COND  Is  low  then  discard  the 
data  appearing  on  the  top  of  the 
stack.  The  next  Instruction  Is  the 
program  counter  +  1. 

Enable  future  and  pending  unmasked 
Interrupts  to  be  serviced.  The 
next  Instruction  Is  the  program 
counter  1 . 

Disable  all  Interrupts  from  being 
serviced.  The  next  Instruction  Is 
the  program  counter  +  1. 


Microinstruction  productions  for  the  CFH  need  to  account  for  the  timing 
delays  In  the  crossbar,  both  In  the  processor  and  In  the  address  generator. 
When  selecting  a  pass  through  transfer  or  "In  to  out”  In  any  direction,  clock 
1  selects  the  path  (SEL).  Clock  2  latches  the  Input  data.  At  Clock  4  the 
output  data  Is  available  to  the  destination.  To  write  data  Into  the  register 
file.  Clock  1  selects  the  path  (SEL),  the  register  address,  and  the  write 
enable  signal  (WRENA) .  At  Clock  2  the  data  must  be  available  to  the  crossbar 
for  writing  into  the  register.  To  read  from  a  register.  Clock  1  selects  the 
port  and  the  register  address.  At  Clock  3,  the  data  Is  available  to  the 
destination,  (mode  1  operation  only).  The  sample  microprograms  In  the 
appendix  take  these  delays  Into  accowt.  They  should  be  examined  carefully. 
Additional  notes  on  microprogramming  can  be  found  in  a  later  section. 
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For  example,  the  IMMAD  field  or  Inmediate  address  field  Is  active  In 
both  phases.  From  the  machine  definition  file  In  the  appendix,  one  sees  that 
the  two  ASSIGN  statements  are  used.  The  first  statement  assigns  physical  bits 
237  thru  339.  The  second  statement  assigns  physical  bits  621  thru  723.  The 
higher  order  bits  are  reserved  for  the  first  phase  and  the  lower  order  bits 
are  reserved  for  the  second  phase.  A  particular  phase  at  any  clock  cycle  Is 
selected  transparent  to  the  user.  Clocking  Is  done  automatically. 

3.2.10  Bac.kpl  ana 

The  CPH  backplane  depicted  In  Figure  47  entitled  "Backplane"  Is  a  custom 
backplane  with  the  footprint  of  a  91)  VME  hoard.  However,  all  CPH  boards 
require  many  more  backplane  pins  then  can  be  provided  on  the  PI,  P2,  and  P3 
connectors  of  a  standard  VME  bus.  Special  connectors  from  AMP  were  designed 
Into  the  custom  backplane.  The  plane  must  also  have  pinouts  on  the  processor 
board  which  are  different  than  those  on  the  address  generator  and  cache  memory 
boards  because  the  processor  board  can  be  cascaded  with  other  processor 
boards.  Each  processor  board  must  then  generate  different  addresses  to  cache. 
The  connector  lists  for  the  processor,  cache,  and  address  generator  boards 
follow  In  Figures  48  and  49. 

The  physical  configuration  of  the  backplane  consists  of  9  slots  and 
three  left  open  for  future  expansion.  Each  connector  will  be  placed  on  a 
0.800  Inch  center  to  center  spacing.  The  slot  assignments  are  listed  next. 

Backplane  Slot  Assignment 


Slot  Number 

System 

Assignment 

1 

1 

lOP 

2 

1 

PROCESSOR 

3 

1 

EMPTY 

4 

1 

ADDR 

5 

1 

EMPTY 

6 

1 

CACHE  MEMORY 

7 

2 

PROCESSOR 

8 

2 

EMPTY 

9 

2 

CACHE  MEMORY 

Slots  3,  5,  and  8  are  empty  to  allow  the  tall  boards  to  have  clearance. 

This  backplane  supports  two  CPH  systems.  The  two  systems  share  a  common 
system  clock,  microsequencer  address  signals,  and  power,  but  all  data  and 
memory  address  buses  are  Isolated  between  slots  6  and  7.  This  allows  each 
system  to  access  Independent  memory  and  data,  and  even  to  execute  different 
microcode  with  the  constraints  that  both  systems  have  the  same  microsequencer 
generating  a  common  program  address. 

The  clock  circuitry  for  the  backplane  remains  to  be  designed.  The 
Initial  design  should  support  all  phases  of  the  CPH  clock  and  should  support 
single  stepping.  The  single  stepping  feature  can  be  installed  on  the 
frontplane  with  a  debounce  switch  and  as  an  alternating  TTL  signal  from  the 
lOP.  The  ECL-to-TTL  conversion  should  be  done  on  the  backplane. 
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4.0  Ifieraprograamiiig  tha  CFH 

Mlcroprogranmlng  the  CPH  Is  done  with  the  mlcroassendjler  provided  using 
MlcroAsm.  Here,  a  user  would  develop  an  asseodjly  level  program  with  the 
MicroAsm  assembler  syntax.  A  predefined  description  of  the  CPH  has  been 
entered  into  the  Genasm  files.  A  t3rplcal  production  of  the  microcode  for  the 
assembly  level  application  program  uses  the  following  command  line. 

Microasm  mulm.asm  -cph  -f 

This  command  line  uses  the  predefined  machine  definition  tables  of  the  cph 
file  and  generates  the  microcode  for  the  mulm.asm  assembly  level  code.  Output 
will  be  In  a  file  labeled  as  "mulm.ldf. 

4.1  Theory  of  Oporetlon 

Generating  microprograms  for  the  CPH  requires  the  MICROASM  retargetable 
microassembler.  There  are  three  programs  entitled,  GEHASM,  MICROASM,  and  MPP< 
These  three  executable  files  should  be  In  the  current  directory  you  are 
writing  the  assembly  level  programs.  As  an  example,  the  following  sequence  of 
steps  are  necessary  to  produce  a  binary  file  for  the  machine.  That  output 
file  will  have  the  root  name  of  your  source  and  the  extension,  "LDF". 

4.1.1  Soquonco  of  Stapa 

To  create  and  assemble  a  program,  two  steps  are  necessary  as  follows: 

1.  Create  your  assendily  level  program  with  any  text  editor. 

Save  as  an  ASCII  file  only. 

2.  Keystroke  the  following  command  line 

MICROASM  <Y0UR  FILE  NAME.ASM>  -tCPH  -f 

This  Is  the  entire  sequence.  This  example  uses  the  already  developed 
tables  for  the  CPH  which  should  be  In  your  directory.  The  "-f"  string  tells 
MlcroAsm  to  produce  a  binary  output  PROM  file  with  the  root  name  of  your 
assembly  program. 

4. 1.1.1  An  ba^la 

On  the  disk  provided  are  18  files.  Including  Mlcroasm.exe,  Genasm.exe, 
MPP.exe,  CPH. FIX,  MULM.ASM,  MULM.LDF,  DAFY.FIX,  and  DAFT. LDF.  To  produce  a 
PROM  readable  file  In  binary  from  the  MULM.ASM  assembly  progrcun,  type  the 
following: 

MICROASM  MULM.ASM  -tCPH  -f 

This  command  line  will  assemble  the  program  called  MULM.ASM,  using  the  machine 
description  found  In  the  CPH. FIX  files  and  produce  MULM.LDF.  After  completing 
the  steps,  examine  the  MULM.LDF  file.  It  should  have  four  microinstructions 
of  768  bits  width.  The  source  program,  MULM.ASM,  is  found  In  the  appendix 
along  with  the  MULM.LDF  and  CFH. FIX  machine  description  file.  Verify  that  the 
micro  orders  in  the  LDF  file  agree  with  your  syntax  In  MULM.ASM. 
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4. 1.1. 2  Tha  LDF  fllas 


LDF  files  are  produced  by  appending  In  the  Mlcroasm  command  line  the 
symbols  "-f".  The  output  file  will  have  the  same  root  name  as  the  ASM  file 
but  will  have  the  LDF  extension.  This  file  Is  used  to  produce  the  PROM  words. 
This  LDF  file  can  be  viewed  to  verify  the  bits  In  each  microorder  selected  by 
your  assembly  program.  For  example,  an  AAA. LDF.  file  was  created  from  the 
AAA. ASM  file  In  your  example  section.  It  Is  two  microinstructions  long.  The 
very  first  bit  In  the  upper  left  comer  Is  physical  bit  768.  The  lower 
rightmost  bit  Is  physical  bit  1.  The  most  significant  384  bits  represent 
phase  1  microorders  In  each  microinstruction  while  the  least  384  bits 
represent  the  phase  0  microorders.  To  locate  Individual  fields  requires  you 
to  compare  the  MI  format  drawing  with  the  LDF  file.  Be  careful.  Some  of  the 
fields  are  spread  across  Isolated  physical  bits.  The  Immediate  address  field 
Is  one.  ADDRESS  RAMI  Is  another.  There  Is  potential  for  confusion  In  several 
areas.  These  are  clarified  In  the  sections  below. 

4. 1.2.1  Default  Bits 

In  order  to  avoid  having  to  specify  all  bits  of  a  microinstruction  In 
each  assembly  Instruction,  default  values  are  specified  In  the  CFH 
description.  There  Is  a  default  value  for  each  of  the  fields  as  well  as  for 
each  subfield  of  each  field.  There  Is  also  a  global  default  bit  value 
specified  with  the  defblt  directive  that  Is  used  when  the  proper  default  Is 
not  available.  Since  all  fields  and  subfields  In  the  CPH  description  have 
defaults  specified,  this  global  default  bit  will  never  be  used. 

When  a  field  Is  not  specified  at  all  In  an  Instruction  (no 
$<fleld_naiiie>) ,  then  the  default  for  the  entire  field  Is  used.  If  there  Is  no 
default  for  the  entire  field,  the  global  default  bit  value  Is  used  Instead. 
When  the  field  Is  specified  but  a  subfield  Is  left  out,  either  between  commas 
or  at  the  end,  the  default  for  the  subfield  Is  used.  If  there  Is  no  default 
for  the  subfield,  the  global  default  bit  Is  used  again  rather  than  the  field 
default.  Any  or  all  of  the  subfields  can  be  left  out  and  they  will  be 
replaced  with  the  subfield  defaults.  For  most  of  the  fields,  the  default 
values  are  the  same  In  the  field  as  In  the  subfields.  The  exceptions  are  the 
$CCS,  $IMM  and  $MWR  fields.  The  $SEQ  field  Is  also  unusual  because 
assignments  to  the  physical  bits  have  been  made  from  Its  subfields  rather  than 
the  entire  field.  For  that  reason,  the  $SEQ  field  default  has  no  effect  and 
the  $SEQ  field  must  be  specified  In  an  Instruction  to  keep  It  from  getting  a 
"don’t  care”  value.  It  need  not  be  given  any  subfield  values,  as  they  will 
default  to  a  continue  Instruction,  but  a  $SEQ  must  be  present.  Physical 
fields  which  are  not  assigned  any  bit  values  at  all  will  get  "don’t  care” 
values . 


4. 1.2. 2  IflMdlata  Data 

To  use  the  Immediate  field.  It  la  necessary  to  specify  $Ilftl  or  $Iltl  EN 
(Disable  Is  the  default  value  for  the  field,  but  ENable  Is  the  default  value 
for  the  subfield).  The  data  value  Is  held  In  the  $REG  field  and  must  be 
specified  by  filling  In  each  of  the  aubflelds  of  the  $RE6  field  with  the 
appropriate  number  of  bits  from  Its  binary  representation.  For  example,  to 
specify  the  value 
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OBOOOOl 1 1 100001 1 1 100001 1 1100001 1 : 10000 


would  require 


$KEG  0X0 1,0X38, OXOF , 0X03 , 0X3 , 0X03 , 0X30 

Use  only  hex  or  octal  format  in  Mlcroasm.  Do  not  use  binary .  This  is 

inconvenient,  but  the  Immediate  field  should  not  be  needed  very  often  anyway. 

The  immediate  address  field  can  also  be  used  to  send  literal  addresses 
to  the  program  counter.  It  is  done  similarly.  For  example,  the  microorder 
$lljMADD  OxFFFF  will  emit  the  bits,  OBI 11111111111  in  the  immediate  address 
field. 


4. 1.2. 3  CFH  Ul!  VoxMt 

When  assembling  microcode  for  the  CPH,  the  format  shown  in  Figure  50 
applies.  An  HI  word  is  384-blts  long  partitioned  into  8  Rms.  A  single  MI  is 
mapped  as  shown  across  several  physical  devices.  Care  must  be  exercised  in 
downloading  the  code  from  the  host  so  that  the  words  map  accordingly. 

4.2  Algorlthau 

Severe  computational  requirements  are  placed  upon  WSMR  radar  and 
telemetry  installations  when  multiple  sensing  and  unreliable  data  acquisition 
occurs.  Decentralized  tracking  via  the  new  Square  Root  Information  Filter 
(SRIF)  offers  exceptional  promises.  SRIFs  easily  handle  sensor  misalignment, 
adapting  to  unexpected  randomness,  and  noisy  telemetry.  The  optimal  tracker, 
however,  must  be  computationally  efficient  and  fast.  The  tracker  must  also 
correlate  multiple  objects  with  measurements,  requiring  the  tracking  filter  to 
be  run  on  different  sequences  of  measurements.  To  be  reliable,  the  tracker 
must  be  numerically  stable  under  extremely  tight  real  time  constraints. 
Figure  51  entitled,  "Decentralized  SRIF  Architecture"  depicts  the  typical 
processing  chain  and  Figure  52  depicts  the  distributed/parallel  architecture 
for  combining  local  processors  into  the  decentralized  tracker  scheme. 

Both  the  CPH  and  the  VPH  boards  can  serve  as  the  local  processor  for  the 
SRIF.  Where  significant  vector  operations  are  required,  the  VPH  excels  in 
real-time  performance.  When  significant  matrix  meuiipulations  occur,  the  CPH 
is  the  better  choice.  It  is  anticipated  that  the  major  computational  task  is 
the  matrix  inversion  which  is  highly  sensitive  to  the  ill-condition  of  the 
matrix.  Matrix  ill-conditioning  can  be  quantified  by  the  Mel-Penrose  index. 
This  index  is  the  absolute  value  of  the  difference  between  the  largest 
eigenvalue  and  the  smallest  eigenvalue.  In  practical  terms,  this  index  is  i 
measure  of  the  difference  between  the  largest  energy  signal  and  the  smallest 
energy  signal. 

Matrix  inversion  can  be  accomplished  by  LD  factorization,  Gaussian 
elimination,  Gram-Schmidt  Factorization,  Hermitian  matrix  inversion,  and 
scaled  Givens  rotations,  general  matrix  inversion. 
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CPH  RON  FORNAT 

Each  coluin  represents  a  single  >8  RON 
January  27,  1992 
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Decentralized  Tracking  via 

New  Square  Root  Information  Filter  (SRIF)  Concepts 


flgur*  52.  DdLstrllnitad/Parallsl  Arehltaetur* 
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Algorithm  development  qualitatively  meeting  all  Technical  breakthrough  offering 

objectives  completed.  potential  for  dramatic  improvement  in 

meeting  tracking  needs  with  relaxed 

High  fidelity  simulation  demonstrating  quantitative  requirements, 

performance  in  progress. 


The  computational  budget  for  a  complete  SRIF  is  the  following: 

SRIF  Computational  Budget 


Matrix  Inversion  40% 
Vector  Multiplication  26% 
Correlation  14% 
Numerical  Integration  6% 
Scalar  Manipulation  14% 


The  major  tasks  include  adaptive  tracking,  nonlineeu:  filtering,  batch 
initialization,  sensor  control,  and  track  correlation.  Adaptive  tracking  can 
be  accomplished  via  several  methods  some  of  which  are  listed  in  Figure  53 
entitled  "Adaptive  Algorithms".  They  include  the  LMS,  RLS,  FLA,  FTF,  and 
SFTF.  Note  that  the  LMS  is  a  slow  tracker  but  its  computational  complexity 
(number  of  equivalent  multiplication).  The  SFTF  is  fast  but  its  computational 
complexity  is  4.5  times  worse  than  the  LMS.  The  FTF  is  not  stable.  Therefore 
it  is  not  suitable  for  the  SRIF  or  the  EVA  architecture. 

During  April  1990,  a  new  algorithm  was  investigated  for  the  time  motion 
resolution  task  at  WSMR,  because  this  is  a  very  demanding  application  and  time 
consuming  to  HSMR.  It  was  found  that  the  new  algorithm  could  improve  and 
enhance  signal  analysis  of  signals  which  are  both  time  and  frequency  limited 
without  the  need  for  long  windows  as  is  required  when  using  the  Fourier 
transform.  Because  this  new  algorithm,  called  the  Higner-Ville  transform,  has 
significant  improvements  over  the  Fourier  transform,  an  intensive  analysis  of 
its  features  was  made  and  applied  to  the  CPH.  The  CPH  as  currently  configured 
appears  to  support  this  impozrtant  new  discovery. 

The  Mentor  target  tracker  algorithms  (a  realization  of  DSRIF)  were  also 
examined  carefully  for  Implementation  into  either  or  both  the  VFH  and  CPH. 
The  basic  sequence  of  steps  in  the  computations  Is  as  follows: 

1.  Take  measurements  (range,  rate,...) 

2.  Execute  local  filters  In  parallel 

3.  Merge  1  at  the  global  level 

4.  Local  filter  time  update 

5.  Global  merge 

However,  additional  equations  need  to  be  computed  In  order  to  support  steps  1 
through  5.  All  matrices  appear  to  be  less  than  25  x  25  elements  In  size. 
There  are  no  real-time  matrix  Inversion  operations.  One  Inversion  Is  needed 
at  t.he  onset,  however.  Several  orthogonal  transformations  are  needed  but 
appear  to  be  straightforward.  Givens  rotations  were  suggested  by  Dr.  Mitch 
Belza  for  some  matrix  manipulations. 


LMS  :  stochastic  gradient  algorithm  (Least  Mean  Squares) 
RLS  :  ordinary  Recursive  Least-Squares  algorithm 
FLA  :  fast  RLS  algorithm  in  lattice  form 
FTF  :  fast  RLS  algorithm  in  transversal  filter  form 
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4.2.1  Algorlthaa  for  Solylsg  Llsoar  Syatou 

STC’s  design  review  of  the  VPH,  with  respect  to  providing  a  full  range 
of  math  functions,  has  yielded  a  healthy  respect  of  Its  calculation 
capabilities.  The  VPH  has  4  separate  calculation  units  which  can  riin  In 
parallel,  each  of  idilch  can  perform  a  square  root  In  approximately  1.52 
microseconds  and  a  division  In  less  than  1  microsecond.  While  these  figures 
are  not  the  fastest  figures  In  the  world,  they  are  very  respectable  when 
viewed  In  the  context  of  the  architecture’s  main  function,  FFTs,  which  require 
complex  multiply  accumulates.  This  speed  and  flexibility  allows  the 
architecture  to  provide  a  wealth  of  processing  speed  which  can  be  used  for 
virtually  any  mathematical  functions  idilch  might  need  to  be  performed.  When 
the  overall  speed  of  the  existing  VPH  architecture  was  compared  with  an 
architecture  utilizing  an  additional  processing  unit  such  as  the  BIT  chip,  the 
cost  to  performance  ratio  of  the  speedup  was  very  poor  and  the  possible 
enhancement  was  discarded. 

Many  different  algorithms  solve  matrix  equations,  and  most  of  them  rely 
on  trlangularlzatlons  of  the  Input  matrix.  Trlangularlzatlon  Is  Invariably 
followed  by  some  sort  of  substitution  to  find  the  solution  vector.  Thus,  the 
most  efficient  solutions  are  those  \dilch  require  the  fewest  calculations  for 
their  trlangularlzatlon  and  subsequent  backsubstltutlon.  LU  factorization  and 
Gaussian  elimination  are  now  examined  since  they  are  Important  equation 
solvers . 

4.2.2  LU  Factorization 

One  effective  method  of  solving  a  linear  system  Rw>s  Is  to  factor  the 
coefficient  matrix  R  Into  a  product  of  two  triangular  matrices.  The  problem 
Is  then  reduced  to  solving  two  triangular  systems.  The  LU  factorization 
produces  a  lower  triangle  matrix  L,  and  an  upper  triangle  matrix  D,  tdiose 
product  Is  the  original  matrix:  LU>R.  This  factorization  Is  computationally 
simple  because  It  consists  primarily  of  Inner  product  calculations.  Once  a 
factorization  Is  found,  the  solution  Is  slnq>ly  a  set  of  backsubstltutlons . 

In  recent  years,  the  LU  decomposition  has  not  received  much  attention, 
both  because  It  Is  not  very  suitable  for  systolic  array  Implementation,  and 
because  It  Is  already  so  well  known.  However,  because  so  much  Is  known  about 
It,  and  since  the  proposed  Implementation  Is  a  pipeline  rather  than  an  array, 
the  LU  algorithm  appears  to  be  the  best  solution. 

4.2.3  GanssdLaa  lllaljuitlaii 

Despite  origins  that  date  from  at  least  250  B.C.,  elimination  methods 
are  still  viable  as  solution  vehicles  for  linear  equations.  Gaussian 
elimination  Is  widely  know,  being  the  primary  method  taught  In  Introductory 
linear  systems  courses.  The  algorithm  consists  of  a  series  of  row 
Interchanges  (called  pivots),  combined  with  subtraction  of  matrix  elements. 
It  forms  an  upper  triangle  matrix  by  eliminating  elements  In  the  lower 
triangle  of  the  coefficient  matrix.  The  coiiq>utatlonal  coiiq>lexlty  of  Gaussian 
elimination  Is  Identical  to  that  of  LU  decomposition;  In  fact.  If  a  specific 
pivoting  strategy  Is  followed,  both  methods  will  compute  with  the  same 
accuracy . 
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4.2.4  6r«ifr>8efaad^t  Dacoopoaltloa 


Another  elimination  method  Is  the  Gram-Schmldt  algorithm  vhlch  performs 
a  Cholesky  factorization  on  Hermltlan  posltlve-semldeflnlte  matrices.  Since  a 
spatially  distributed  covariance  matrix  Is  Hermltlan  and  posltlve- 
semldeflnlte,  Gram-Schmldt  Is  a  valid  algorithm  for  consideration.  Since  It 
Is  an  elimination  method,  Gram-Schmldt  operates  like  Gaussian  elimination, 
first  producing  an  upper  triangle  matrix,  and  then  backsubstltutlng  to  find  w. 
Unlike  standard  Cholesky  factorization,  the  Gram-Schmldt  method  requires  no 
square  root  calculations. 

By  1990,  researchers  designed  an  array  processor  for  adaptive 
beamforming  based  on  the  Gram-Schmldt  algorithm.  They  replaced  the  reciprocal 
calculation  with  a  shift,  essentially  the  reciprocal  of  the  nearest  power  of 
two.  While  this  method  avoids  division.  It  solves  a  perturbed  set  of 
equations.  Others  were  able  to  eliminate  the  divisions  without  disturbing  the 
equations  by  generalizing  the  Gram-Schmldt  method.  Unfortunately,  their 
method  of  eliminating  the  reciprocal  tripled  the  nuniber  of  multiplications . 

4.2.5  Invaraloti  of  •  HazBltlan  Matrix 

Similar  to  the  LU  decomposition.  Inversion  of  a  Hermltlan  matrix  Is  much 
easier  than  Inversion  of  an  arbitrary  matrix.  First  the  matrix  Is 
trlangularlzed,  then  the  new  matrix  Is  formed  by  backsolvlng.  The  main 
difference  between  LU  decomposition  and  Hermltlan  matrix  Inversion  Is  the 
method  of  backsubstltutlon.  Whereas  LU  decomposition  reduces  a  triangular 
matrix  down  to  a  vector  with  0(N2)  operations,  the  symmetric  Inversion  expands 
a  triangle  matrix  back  to  a  full  square  matrix  with  0(N3)  operations. 

4.2.6  Scaled  Glwene  Botetlone 

Despite  a  somewhat  higher  computational  complexity,  scaled  Givens 
rotations  have  received  much  attention.  The  main  advantages  of  this  algorithm 
are: 


1.  easy  Implementation  with  a  variety  of  parallel  structures 

2.  flexibility  to  perform  several  matrix  operations  (e.g.  singular  value 
decomposition,  dlagonallzatlon,  and  trlangularlzatlon) 

3.  ability  to  compute  plane  rotations  without  square  roots,  and  with 
half  the  multiplications  of  standard  Givens  rotations 

4.  high  efficiency  for  sparse  matrix  operations 

5.  amenable  to  recursive  least  squares  minimization  techniques 

Since  these  advantages  have  little  effect  on  the  solution  of  linear 
equations,  we  conclude  that  Givens  rotations  are  more  suitable  for 
calculations  other  than  a  linear  solution. 

4.2.7  Coapcrleon  of  Algorltloe 

Though  all  of  the  algorithms  perform  essentially  the  same  operation,  a 
determination  of  weight  vector  w,  they  are  not  equal  In  complexity.  Table  1 
gives  a  comparison  of  the  number  of  operations  (real  multiplies,  reciprocals, 
and  additions)  needed  for  each  of  the  methods. 
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Tabla  1.  CcMplaxlty  of  Solutlooo  to  Slaultaaoooo  Kquotlono 


Algoritho 

Nuabar 

of  Oparationa 

Total 

for  N-32 

LU 

roc tori sotlon 

Kulti 

2/3  ♦  5M^  -  7/3  M 

27,040 

Racip: 

Addi 

2/3  ♦  4M^  -  2/3  I 

32 

25,920 

Gouuian 

Ellalnotlon^ 

Nulti 

2/3  ♦  51^  -  7/3  M 

27,040 

Raoipi 

Addi 

2/3  ♦  4M*  -  2/3  R 

32 

25,920 

Craa- 

Schaldt 

Mult: 

2H^  ♦  2n!  -  4N 
(6R^  -  2ir  -  411) 

67,456 

(194,432) 

Factorisation^ 
(and  Diviaion- 
Frea  Version^) 

Racip: 

N 

2M'  ♦  2!r  -  411 
(4M^  -  4M) 

32 

(0) 

Add: 

67,456 

(130,944) 

Invaraion 

Mult: 

211^  ♦  11/2  *3/2  M 

71 ,216 

of  Haraitian 
aatrix* 

Racip: 

"  ^  2 

32 

Add: 

2M^  ♦  4M^  -  211 

69,568 

Soalad  Givana 
Rotational 

Mult: 

8/3  ♦  105/6  ♦89/6  H 

105,776 

Racip: 

1/2  Hf  -  1/2-R 

496 

Add: 

8/3  ♦  12ir  ♦  28/3 

N 

99,968 

Ganaral 

Mult: 

29/6  ♦  3M^  -  53/6 

M  ♦  5 

161 ,173 

Matrix 

Invaraion' 

Raoipt.  ■  .  „ 

32 

Add: 

29/6  -  2ir  -  11/6 

M  ♦  5 

156,277 
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Table  1  shows  the  computational  superiority  of  the  LU  factorization  and 
Gaussian  elimination  methods,  in  terms  of  multiplications  and  additions.  If 
reduction  of  divisions  is  the  primary  goal,  then  one  of  the  Gram-Schmidt 
algorithms  should  he  used.  One  can  also  see  that  matrix  inversion  is  the  most 
complex,  and  therefore  the  least  desirable  of  the  methods. 

Because  LU  factorization  is  the  fastest  of  the  algorithms,  and  because 
Maron  shows  that  it  is  easier  to  implement  than  Gaussian  elimination,  use  of 
LU  factorization  is  suggested.  Our  studies  show  that  LU  factorization  is 
computationally  8lDq>ler  than  other  methods,  and  other  publications  recommend 
it  as  the  optimum  algorithm  for  solutions  of  simultaneous  equations.  For 
those  reasons,  implementations  research  currently  focuses  on  efficient 
circuits  for  LU  factorization. 

Table  2  compares  several  least-squares  computational  techniques.  The 
normal  equations.  Householder,  Golub  factorizations,  standard  Givens  rotation, 
fast  Givens  rotation,  scaled  Givens  rotation,  and  Gram-Schmldt  methods  are 
considered.  Either  the  normal  equations  or  the  Householder  Golub  techniques 
require  global  coomunlcatlons .  Additionally,  these  two  techniques  are 
sensitive  to  ill-conditioned  matrices.  Hence,  the  normal  equations  or  the 
Householder  Golub  method  are  not  amenable  to  systolic  implementation.  The 
Gram-Schmidt  method.  Included  for  completeness,  is  not  recursive  and, 
therefore,  is  not  considered  for  systolic  implementation. 

The  remaining  methods  are  based  on  the  Givens  rotation  triangular 
decomposition.  The  standard  Givens  rotation  requires  pivoting  as  well  as 
square-root  computation.  This  slows  the  computation  on  systolic  arrays.  The 
square-root  free  Givens  rotation  eliminates  the  square-root  computation  but 
still  requires  pivoting.  The  scaled  Givens  rotation  eliminates  both  the 
square-root  computation  and  pivoting.  Additionally,  the  scaled  Givens 
rotation  operates  on  matrix  bands.  It  is  not  necessary  to  perform  any 
computation  on  bands  that  contain  only  null  elements.  A  computational  savings 
is  realized  if  the  data  matrix  is  in  banded  form.  Note  that  the  square-root 
free  and  scaled  Givens  rotations  require  half  as  many  multiplies  as  the 
standard  Givens  rotation.  The  scaled  Givens  rotation  only  requires  1  division 
operation  as  opposed  to  2  in  the  square-root  free  rotation.  Apparently  the 
scaled  Givens  rotation  is  superior  to  the  other  methods  studied  both  in  terms 
of  computation  speed  and  systolic  implementation  complexity. 


168 


Tabla  2.  Hbl^tad  Lust  S^uatm  Caaputatlonal  Mathods 


I - Triangular  Daconpoaltiona - 1 


Normal 

Equa¬ 

tions 

House- 

Holder 

Golub 

Standard 

Givens 

Rotations 

Fast 

Givens 

Rotations 

Scaled 

Givens 

Rotations 

Gram- 

Schmidt 

Systolic 

Ananable 

Non- 

nearest 

neighbor 

data 

paths 

Requires 

global 

comm 

Yes,  but  is 
slow  and 
processor 
complex 
recursive 
separate 
back-sub¬ 
stitution 
systolic 
array 

If 

factored 

\/~  free 

operation, 

nearest 

neighbor 

pivoting 

Increases 

data  flow 

complexity 

No 

pivots 

and 

v/~  free 
nearest 
neighbor 
cofun 

Not 

recur¬ 

sive 

Additions/ 

Subtractions 

Mult. /Stage 

N 

N/2 

N/2 

Olv. /Stage 

2 

1 

Shifts /Im 

Seal. 

Compl. 

Seal. 

Compl. 

Complex 

Complex 

2 

Latency 

r+c+1 

Stable 

Sensitive  to  Matrix 
Ill-Condition  Number 

Yes 

Equiv.  to 

Standard 

Givens 

Well 

Cond. 

Pivoting 

2x1 

Vector 

2x1 

Vector 

None 

Fading 

Signal 

Capacity 

(Weighted) 

Complex 

Complex 

Complex 

Simple 

Simple 

Row 

Removal 

Complex 

Complex 

Complex 

Complex 

Simple 

Idle 

Processors 

N/2 

N/2 

N/2 

Computation 

Time 

2r+c+l 

3m4- 

3(q-l)+z+l 

^0(m+r) 

lluiid>er  of 
Processors 

c(r+l)/2 

q(w«-2) 

0(w^+zw) 

Table  Notation:  r  -rows  of  rectilinear  matrix,  c  -  columns  of  rectilinear 
«*trix,  n  -  word  length 
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4.2.8  VPH  mm 


A  description  o£  the  FFT  Is^lemented  on  the  VFH  Is  now  described.  It 
serves  as  an  Introduction  to  the  I/O  compute  overlap  capabilities  of  the  VPH 
and  should  be  carefully  studied.  It  will  serve  as  the  benchmark  training 
program  for  the  VPH.  Hence,  a  full  tmderstandlng  of  Its  operation  Is  useful 
for  future  code  development. 

A  1024  point  complex  FFT  Is  an  Ideal  application  for  the  VPH  board.  The 
Zoran  DSP  chips  have  the  FFT  coefficients  In  ROM  for  up  to  that  size.  In 
addition,  a  1024  point  FFT  can  be  decomposed  Into  two  waves  of  thirty-two  32 
point  FFTs,  each  wave  perforoilng  five  of  the  ten  passes  required.  Though  the 
chips  are  capable  of  64  point  FFTs  In  a  single  Instruction,  processing  32 
points  at  a  time  Is  more  efficient  when  multiple  FFTs  are  required.  This  Is 
due  to  the  ability  of  the  chips  to  process  data  In  half  of  the  on-board  RAM 
\dille  transferring  data  between  external  memory  and  the  other  half  of  the  on¬ 
board  RAM.  Since  storing  processed  data  and  loading  new  data  takes  less  than 
half  the  time  that  an  FFT  operation  does,  they  can  effectively  be  done  for 
free  even  when  sharing  a  bus  between  two  chips  working  on  the  problem 
simultaneously . 

The  problem  Is  very  amenable  to  parallel  use  of  all  four  DSP  chips  at 
once.  Each  chip  can  perform  eight  of  the  thirty- two  FFT  operations  In  each 
wave.  The  only  time  s3mchronlzatlon  Is  needed  between  the  processors  Is 
between  waves.  During  each  wave,  each  processor  works  with  a  distinct  subset 
of  the  points.  However,  the  points  have  to  be  redistributed  among  the 
processors  between  waves,  so  It  Is  necessary  to  ensure  that  all  of  them  finish 
the  first  wave  before  the  second  one  starts.  This  Inherent  parallelism  In  the 
algorithm  means  that  there  Is  very  little  overhead  required.  The  Initial  load 
and  final  store  operations  cannot  be  pipelined  with  the  FFT  operation  and  the 
parallel  version  has  four  times  as  many  of  these.  They  will  also  occur  at 
almost  the  same  time  for  the  two  processors  sharing  a  bus,  resulting  In  half 
the  speed.  These  factors  should  have  only  about  a  lOZ  effect  on  the  execution 
time.  The  VPH  board  should  therefore  be  able  to  perform  a  1024  point  FFT 
almost  four  times  as  fast  as  a  system  with  a  single  Zoran  DSP  chip. 

The  actual  code  works  as  follows.  First  the  processors  clear  their 
semaphore  flags  to  Indicate  that  they  are  working.  They  then  load  their  mode 
register  with  values  that  Indicate  that  the  Internal  RAM  Is  to  be  divided  Into 
two  banks  and  that  bank  references  are  to  be  Inverted  each  time  the  loop 
counter  Is  decremented.  Then  the  two  Index  register  are  set  to  point  to  the 
locations  for  Incoming  and  outgoing  data.  In  the  current  code,  the  first  wave 
Is  done  In  place  so  they  point  to  the  same  locations.  A  single  Index  register 
could  be  used,  but  using  both  makes  It  easier  to  change  to  using  a  different 
location  for  the  outgoing  data  If  desired.  The  index  registers  on  each 
processor  are  Initialized  to  values  offset  by  eight  from  the  previous 
processor.  This  allows  for  each  processor  performing  eight  FFTs.  ^e  loop 
counter  register  Is  Initialized  to  perform  the  seven  fully  pipelined 
Iterations.  The  first  set  of  data  points  are  loaded  from  locations  spaced  32 
elements  apart,  as  required  by  the  FFT  algorithm  being  used  when  the  Input 
data  Is  In  sequential  order.  Each  subsequent  set  of  data  points  will  be 
loaded  from  a  location  one  element  after  this  one,  so  that  after  eight  sets  on 
each  processor,  all  points  will  have  been  processed.  Seven  of  the  eight  sets 
are  handled  In  a  loop  that  loads  a  set  Into  the  unused  RAM  bank,  starts  an 
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FFT,  and  then  begins  storing  the  results  from  the  previous  bank.  After  the 
loop,  the  final  data  set  Is  stored.  Outgoing  data  Is  stored  In  bit-reversed 
order  to  compensate  for  the  reversal  that  occurs  during  the  FFT  calculation. 

When  each  processor  finishes  the  first  wave.  It  uses  one  of  Its  stattis 
bits  to  Indicate  that  fact.  The  68020  or  one  of  the  DSP  chips  designated  to 
be  master  performs  a  full  or  partial  handshaking  operation  using  one  of  Its 
own  status  bits  to  synchronize  the  end  of  the  first  wave  and  the  start  of  the 
second.  In  the  second  wave,  the  FFTs  are  performed  on  sets  of  32  adjacent 
elements.  Each  DSP  chip  again  handles  eight  adjacent  sets.  The  output 
results  must  be  put  In  a  separate  output  area  this  time  because  they  are 
stored  with  a  spacing  of  32  again.  Instead  of  the  spacing  of  one  that  the 
Input  Is  loaded  from.  This  change  In  spacing  performs  a  bit-reversal  between 
the  bits  used  to  Index  the  first  and  second  waves.  Just  as  reversing  each  of 
the  blocks  during  the  store  operations  performs  a  bit-reversal  of  the  Index 
within  a  wave.  This  results  In  the  output  being  In  normal  order  Instead  of 
bit-reversed  order.  Each  set  Is  processed  with  an  offset  Into  the  coefficient 
table  to  provide  the  correct  value  to  account  for  It  being  part  of  a  larger 
FFT.  With  these  differences,  the  second  wave  Is  performed  In  the  same  manner 
as  the  first.  When  all  processors  Indicate  that  they  are  finished  with  the 
second  wave,  the  1024  point  FFT  Is  complete. 

The  entire  operation  should  take  133  clock  cycles  for  the  Initial  load 
and  final  store  of  each  wave,  doubled  for  the  bus  sharing,  plus  334  cycles  for 
each  32  point  FFT.  Allowing  some  extra  time  for  synchronization,  the  entire 
FFT  should  take  arotmd  475  microseconds  with  a  25  MHz  clock.  This  compares 
with  a  benchmark  from  Zoran  of  1732  microseconds  for  a  single  chip. 

4.2.9  VPH  Software  Convaatlana 

In  order  to  allow  the  software  modules  on  the  VPH  board  to  work  together 
properly,  conventions  must  be  established  for  their  Interaction.  This  Is 
particularly  Important  because  the  VFH  has  multiple  processing  elements  that 
need  to  Interact.  The  board  provides  a  nund>er  of  mechanisms  for  communication 
between  these  elements.  Setting  conventions  for  how  they  will  be  used  Is 
necessary  for  consistency. 

VPH  Resources 

The  processing  elements  on  the  VPH  board  are  four  Zoran  VSP  (Vector 
Signal  Processor)  chips  and  a  Motorola  68020  microprocessor.  A  VME  bus 
Interface  also  allows  an  external  processor  to  access  the  board. 

There  are  two  types  of  shared  memory  on  the  board.  There  are  two  local 
buses  with  two  of  the  four  VSP  chips  attached  to  each.  Local  memory  on  each 
bus  Is  shared  between  the  two  VSPs  that  are  attached  to  It.  The  VSP  bus 
protocol  allows  bus  locking  to  provide  the  mutual  exclusion  necessary  to  use 
the  local  memory  for  Interprocessor  comsunlcatlon.  Each  local  VSP  bus  also 
has  access  to  a  four  port  memory  shared  by  all  the  processors. 

The  68020  has  access  to  all  system  resources.  This  Includes  the  local 
memories  on  the  VSP  buses  and  registers  and  control  locations  Inside  the  VSP 
chips  themselves.  It  cannot  lock  the  local  buses,  but  proper  use  of  the  VSP 
control  locations  should  allow  an  equivalent  ability.  The  68020  can  also 
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Interrupt  the  VSP  chips.  With  an  appropriate  interrupt  routine,  that  allows 
the  68020  to  preempt  the  buses  as  well. 

There  is  also  a  status  latch  accessible  by  all  processors.  Each  can 
write  to  two  bits  of  the  latch  and  read  the  other  processors’  bits  from  the 
latch.  This  does  not  provide  any  capabilities  beyond  those  available  through 
shared  memory,  though  it  is  more  convenient  to  use.  In  particular,  it  does 
not  provide  a  mechanism  for  implementing  true  semaphores  to  control  access  to 
other  resources. 

Uses  of  Resources 

The  resources  on  the  VPH  are  not  sufficient  to  allow  completely  general 
synchronization  of  parallel  tasks  running  on  different  processors  without 
considerable  overhead.  However,  they  are  adequate  for  the  algorithms  that  are 
expected  to  execute  on  the  VPH.  Most  of  these  algorithms  will  Involve 
splitting  up  a  task  into  almost  identical  subtasks,  each  of  which  will  be 
executed  on  one  of  the  VSP  chips.  All  working  VSPs  will  therefore  need 
synchronization  at  the  same  points  in  their  subtasks.  This  can  be  performed 
by  using  the  status  register  and  designating  one  of  the  processors  as  a 
synchronization  arbitrator.  In  order  to  maintain  the  symmetry  between  the 
VSP  chips,  the  68020  will  act  in  that  capacity.  This  may  not  be  the  best 
choice  for  future  use,  since  the  68020  may  have  other  tasks  to  perform,  but  it 
is  adequate  for  the  present.  One  of  the  status  bits  for  each  VSP  will  be  set 
to  Indicate  that  it  is  finished  with  its  last  assigned  task.  The  other  will 
be  used  to  synchronize  the  VSPs  by  a  full  handshake  with  the  68020.  This  use 
of  the  second  bit  is  not  strictly  necessary,  since  the  same  effect  could  be 
achieved  by  ending  a  task  every  time  S3mchronization  is  needed.  For  the 
initial  algorithms  being  written,  this  would  probably  be  adeqiiate.  Only  the 
FFTs  need  such  synchronization  and  they  only  need  it  once.  However,  some 
future  algorithms  might  need  multiple  s3mchronlzation  points  and  the  overhead 
of  restarting  the  processors  after  each  one  might  become  excessive.  Another 
possible  method  of  synchronizing  would  be  the  use  of  the  SYNC: [XE]  instruction 
with  a  write  to  the  $CAW  location  on  each  chip. 

The  bus  lock  on  the  shared  local  bus  gives  the  shared  local  memories  the 
most  powerful  communication  mechanism.  Their  limitation  is  that  they  can  only 
be  used  between  the  processors  that  share  them.  This  is  not  useful  for  the 
global  communication  required  by  the  algorithms  being  executed.  Therefore 
this  capability  will  not  be  used.  The  VSP  chips  will  share  code  and  static 
tables  in  these  memories,  but  not  data.  Each  one  will  maintain  its  own 
private  data  area.  For  simplicity,  each  will  be  preallocated  a  run-time  stack 
area  from  t^lch  it  can  allocate  storage. 

The  ability  of  the  68020  to  access  the  VSP  memories  and  registers  can  be 
used  to  communicate  parameters  such  as  the  size  and  location  of  data  to  be 
processed.  These  parameters  will  allow  for  more  fimctlonallty  and  for  the 
slight  differences  in  the  tasks  performed  by  each  processor  without  any 
duplication  of  code.  Placing  such  parameters  directly  into  the  VSP  registers 
would  give  tiny  performance  Improvements,  but  this  is  unlikely  to  Justify  the 
added  complexity  in  the  68020  code.  It  does  give  the  68020  the  ability  to 
invoke  subroutines  that  were  written  to  expect  parameters  in  registers  without 
needing  a  separate  version  that  performs  the  same  task  using  parameters  on  the 
stack. 
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The  parameters  should  be  passed  to  each  VSP  by  constructing  a  call  frame 
on  its  run-time  stack.  The  $SF  register  and  $PC  register  must  be  set  to  the 
correct  values  so  that  It  appears  that  a  call  has  just  been  made.  This  will 
allow  the  same  routine  to  be  Invoked  from  the  68020  or  called  by  the  VSP 
directly  as  part  of  another  task.  Making  the  call  frame  coiiq>atible  with  the 
Zoran  library  conventions  will  allow  that  code  to  be  used  when  a  single  VSP 
chip  Is  sufficient.  In  many  cases,  parallelism  may  be  coarse  enough  that 
standard  library  functions  can  even  be  used  as  part  of  a  subtask.  For 
example,  a  dot  product  can  be  performed  by  four  dot  products  on  one  fourth  of 
the  vector  length,  followed  by  summing  the  results. 

In  order  to  allow  routines  to  act  as  both  subroutines  and  main  routines 
Invoked  by  the  68020,  the  operations  on  the  finished  bits  In  the  status  latch 
must  not  be  contained  In  the  routines.  The  68020  can  reset  the  bits  before 
starting  execution  by  writing  appropriate  Instructions  Into  the  VSP  chips* 
instruction  FIFOs  while  they  are  still  In  slave  mode.  The  setting  of  the 
finished  bits  and  halting  of  the  VSPs  can  be  performed  by  setting  the  return 
location  In  the  constructed  call  frame  to  the  beginning  of  a  routine  to 
perform  those  functions.  Tne  final  return  will  cause  the  VSP  to  execute  those 
instructions  after  completion  of  the  main  routine. 

If  a  routine  Is  going  to  be  Invoked  repeatedly  and  It  doesn’t  modify  any 
of  Its  parameters,  the  same  stack  frame  can  be  used  again.  The  parameters  are 
still  on  the  stack  after  the  return.  If  interrupts  are  disabled,  the  return 
value  is  still  on  the  stack  as  well.  Otherwise  It  may  have  been  written  over 
by  an  Interrupt  after  the  return  and  before  halting  and  will  have  to  be 
"pushed"  back  on.  If  there  are  only  a  small  number  of  sets  of  parameters 
needed  and  each  routine  needs  minimal  stack  space.  It  would  be  possible  to  set 
up  all  necessary  run-time  stacks  beforehand  and  select  one  sli^ly  by  setting 
the  $SP  register  to  point  to  It.  If  the  routines  use  too  much  stack  space  to 
allow  dividing  up  local  memory  In  this  fashion,  a  data  area  pointer  could  be 
Included  In  the  stack  frames  to  be  used  for  allocation  Instead  of  the  stack 
pointer . 

The  most  useful  shared  memory  is  the  4  port  SRAM,  since  it  can  be 
accessed  by  multiple  processors  simultaneously.  For  many  algorithms  it  may  be 
used  for  all  signal  data,  with  processed  data  being  moved  out  from  one  buffer 
and  replaced  with  new  data  \dille  processing  Is  performed  on  data  In  another 
buffer.  The  4  port  memory  Is  relatively  small,  however.  It  only  has  room  for 
two  sets  of  Ik  complex  points.  Two  sets  are  adequate  for  buffering  If  the 
algorithm  can  be  performed  In  place.  The  32x32  2D  FFT  can  be  performed  In 
place,  but  the  Ik  FFT  cannot.  With  multiple  data  sets,  a  slower  version  of 
the  Ik  FFT  that  can  be  performed  In  place  by  using  an  extra  reordering  pass 
would  probably  allow  greater  overall  throughput.  Algorithms  that  use  large 
data  sets  should  be  written  to  allow  In-place  operation  when  possible. 

Invocation  Conventions 

The  conventions  for  the  use  of  the  hardware  determine  the  mechanisms 
available  for  communicating  between  software  on  different  processors.  The 
Zoran  library  calling  conventions  place  further  constraints  on  the  format  of 
VSP  parameters  being  passed  and  the  saving  and  restoring  of  VSP  registers.  In 
some  cases  where  performance  la  particularly  important.  It  may  be  useful  to 
optimize  the  general  calling  sequence.  Appendix  C  of  the  Zoran  Software 
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Development  Tools  Manual  details  the  calling  sequence  and  possible 
optimizations.  For  the  early  demonstration  code,  a  caller  save  convention  Is 
likely  to  be  more  efficient  than  the  standard  callee  save.  There  may  be  no 
real  subroutine  calls  at  all  and  saving  registers  In  code  that  Is  effectively 
the  main  routine  vhen  It  Is  Invoked  by  the  68020  Is  wasteful.  Directives  In 
the  assembly  code  should  make  It  easy  to  change  to  the  standard  convention  If 
desired  later.  Using  registers  to  pass  parameters  Is  another  possible 
optimization. 

Further  conventions  could  guide  the  choice  of  idiat  data  to  send  In 
parameters.  One  of  the  biggest  Issues  concerns  the  division  of  effort  between 
the  68020  and  the  VSP  chips.  The  VSP  code  may  require  values  derived  from  the 
logical  parameters .  These  could  be  supplied  directly  by  the  68020  or 
determined  by  the  VSP  chips  themselves.  In  particular,  sharing  a  task  between 
multiple  VSP  chips  requires  that  each  perform  a  different  subtask.  They  could 
all  be  given  Identical  task  parameters  along  with  a  chip  number  and  figure  out 
for  themselves  what  subtask  they  are  to  perform.  Alternatively,  each  could  be 
given  different  parameters  determined  by  the  68020  to  define  Its  exact 
subtask.  There  are  advantages  to  each  approach  that  must  be  considered  before 
making  a  choice. 

Passing  task  parameters  and  chip  numbers  allows  the  68020  to  Ignore  the 
Internal  operation  of  the  VSP  algorithms.  If  the  subtasks  are  changed,  the 
68020  code  to  Invoke  the  task  can  still  remain  the  same.  If  the  VSP 
algorithms  are  Invoked  by  a  68020  subroutine  with  the  same  parameters.  It  may 
be  possible  to  copy  the  task  parameters  directly  from  the  68020  stack  to  the 
VSP  stacks.  Such  a  set  of  subroutines  could  be  used  to  allow  execution  of  VSP 
code  to  be  transparent  to  a  68020  programmer,  much  like  a  remote  procedure 
call.  All  of  these  subroutines  could  call  a  single  subroutine  to  copy  the 
stack  frames  Instead  of  needing  to  perform  task  specific  calculations. 
Calculation  of  subtask  parameters  would  be  performed  simultaneously  on  each 
VSP  chip,  rather  than  serially  on  the  68020. 

On  the  other  hand,  the  68020  Instruction  set  Is  much  more  convenient  for 
performing  some  of  these  calculations,  and  there  Is  an  as8eip’>ler  available  to 
make  It  even  more  so.  Being  able  to  perform  them  In  one  place  would  make  some 
of  the  calculations  themselves  simpler  as  well.  The  calculations  could  be 
done  once  and  reused  Instead  of  redoing  them  every  time  the  VSP  code  la 
Invoked.  The  68020  code  could  gain  more  functionality  from  the  VSP  code  by 
combining  VSP  subtask  primitives  In  more  than  one  way.  The  standard  Zoran 
library  functions  could  be  Invoked  directly  to  perform  subtasks  rather  than 
having  to  be  called  Indirectly  from  VSP  routines  that  first  determine  the 
correct  parameters.  This  saves  calling  overhead.  By  controlling  the  task 
division,  the  68020  could  assign  differing  numbers  of  VSP  chips  to  a  task  to 
allow  performance  of  multiple  tasks  at  the  same  time.  This  would  be 
constrained  by  the  lack  of  general  communication  capabilities,  but  might  be 
useful  In  some  cases.  It  would  also  allow  re-dlvlslon  of  tasks  to  provide 
fault  tolerance  In  the  event  of  a  VSP  subsystem  fal.lure.  These  re-dlvlslons 
would  be  less  flexible  and  more  awkward  to  Implement  with  the  other  method. 
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Other  Conventions 


Some  Instruction  parameters  like  ROM  and  first  pass  separation  (FPS)  are 
hardwired  Into  Instructions  with  no  apparent  way  to  set  them  from  a  parameter 
register.  ROM  needs  to  be  set  to  different  values  for  different  subtasks  In  a 
large  FFT.  FPS,  LPS  (last  pass  separation)  and  ROM  need  to  be  set  to 
different  values  for  a  single  routine  to  be  able  to  handle  different  sized 
FFTs.  It  may  be  possible  to  get  the  effect  of  one  of  FPS  and  LPS  equal  to  1 
with  the  other  less  than  16  by  using  an  appropriate  $REPEAT  and  $HMPT 
condilnatlon.  The  problem  of  needing  different  values  for  different  subtasks 
could  be  solved  by  having  separate  code  for  each  subtask  or  by  executing  code 
conditionally  based  on  an  Input  parameter  such  as  chip  number.  It  Is  unclear 
whether  the  latter  can  be  performed  without  multiple  tests  by  using  the  ADDR 
Instruction  for  a  vectored  jump.  The  problem  of  setting  Instruction 
parameters  from  an  Input  parameter  would  be  difficult  to  solve  using  self¬ 
modifying  code  because  the  only  operations  that  can  be  performed  on  full  word 
width  data  are  floating-point.  If  a  task  only  uses  a  routine  with  a  single 
value  for  an  Instruction,  the  68020  can  modify  the  Instruction  appropriately 
before  Invoking  the  task.  An  Initial  ROM  value  can  also  be  sent  to  a  routine 
by  executing  an  FFT  instruction  with  that  ROM  value  beforehand  and  using  pre- 
addition  or  subtraction  mode  to  "access"  it.  Some  method  needs  to  be  decided 
upon  If  very  general  purpose  FFT  routines  are  to  be  used. 

A  smaller  problem  of  the  same  type  is  that  the  $MBS_MSS  register  can’t 
be  used  with  partial  bit  reversal  loads  and  stores  the  way  the  MBS  and  MSS 
parameters  In  Instructions  can.  This  can  be  solved  by  using  the  same  methods 
used  for  the  parameters  that  don’t  have  registers,  or  by  using  extra 
instructions  to  get  the  desired  reversals. 

Conventions  also  need  to  be  established  for  modifying  special  registers 
which  affect  the  operation  of  the  machine.  The  interrupt  masks  for  arithmetic 
exceptions  should  not  be  modified  by  the  VSP  routines  so  that  the  68020  can 
decide  the  level  of  error  checking  being  performed.  Some  of  the  $M0DE  bits 
need  to  be  modified  by  specific  routines  to  get  desired  modes  of  operation. 
Some  may  need  to  have  a  particular  value  at  all  times.  Others  may  need  to 
remain  at  a  value  determined  by  the  68020  for  reasons  similar  to  the 
interrupt  masks.  If  so,  then  all  modifications  to  $M0DE  must  be  made  by 
masking  Instead  of  loading.  Some  method  of  handling  interrupts  when  they 
occur  also  needs  to  be  determined.  Many  more  such  decisions  will  undoubtedly 
arise  during  system  development. 

Implementation  Notes 

Having  the  68020  start  the  VSP  chips  one  at  a  time  executing  application 
code  presents  many  alternatives .  Since  most  applications  start  vector  loads 
early  In  the  code,  the  68020  may  have  difficulty  getting  the  bus  to  start  the 
second  VSP  chip  on  each  bus.  This  will  delay  getting  some  of  the  chips 
started.  With  a  start  pattern  that  first  starts  one  chip  on  each  bus,  this 
can  be  minimized  but  may  still  be  significant.  It  would  also  be  convenient 
for  debugging  under  manual  control  if  the  application  were  started  by  a  single 
event.  For  this  reason,  each  VSP  chip  will  be  started  in  a  polling  loop  and 
for  a  status  bit  from  the  68020  to  be  set  as  the  signal  to  proceed  to  the 
application  code. 
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Just  as  with  the  setting  of  the  finished  bits  at  the  end  of  the 
application,  this  synchronization  should  not  be  Included  In  any  of  the 
application  subroutines .  The  polling  loop  should  be  separate  from  them. 
There  are  two  ways  this  can  be  accomplished.  One  Is  for  the  polling  routine 
to  end  with  a  jump  to  the  start  of  the  application.  The  other  Is  to  add  the 
starting  address  of  the  application  to  the  bottom  of  the  stack,  start  the 
stack  pointer  one  lower,  and  perform  a  return  Instruction  to  get  to  the 
desired  application.  This  Is  better  because  It  simply  requires  adding  to  the 
artificial  stack  frame  that  must  already  be  prepared  rather  than  modifying  a 
jump  Instruction  In  the  polling  routine.  The  same  polling  routine  can  be  used 
for  different  applications.  The  same  routine  can  also  be  used  even  If  the  two 
VSP  chips  sharing  It  must  start  at  different  addresses. 

Here  Is  the  necessary  starting  procedure.  Each  VSP  chip  Is  assigned  a 
stack  area.  This  Is  Initialized  by  "pushing"  the  start  address  of  the  FINISH 
routine,  followed  by  any  parameters  being  passed  to  the  application,  followed 
by  the  start  address  of  the  application.  The  $SP  register  of  each  VSP  chip  Is 
set  to  point  to  the  address  below  the  end  of  the  stack.  The  68020  status  bit 
used  for  starting  Is  set  false.  Each  VSP  Is  started  executing  from  the 
beginning  of  the  START  routine.  When  the  start  bit  Is  set  true,  all  of  the 
VSP  chips  will  exit  the  polling  loop  In  START.  They  will  "return"  to  the 
application  code.  When  the  application  Is  done,  they  will  "return"  to  the 
routine  that  sets  the  VSP  status  bits  to  Indicate  that  they  are  finished  and 
halt. 
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5.0  EleroAra 


A  study  was  made  of  several  commercially  available  micro  assembler 
packages.  Previous  reports  have  referenced  HALE  (Hllevel  Assembly  Language 
Environment)  and  compared  It  to  MlcroASM.  The  following  Is  a  comparison  of 
the  MlcroASM  system  and  another  popular  microassembler  -  the  Mlcrotec  Meta29M 
2900  Macro  Meta  Assembler. 

Meta29M  was  developed  primarily  for  the  AMD  2900  series 
mlcroprogrammable  microprocessors  and  thus  Is  really  aimed  at  different 
problems  than  MlcroASM,  however  It  Is  representative  of  most  microassemblers 
available  today.  Like  MlcroASM,  It  utilizes  a  two-stage  system  consisting  of 
a  Definition  phase  and  an  Assembly  phase. 

The  Definition  phase  allows  Instruction  mnemonics  and  their  associated 
formats  to  be  defined  along  with  constants  and  reserved  symbolic  names.  The 
Definition  program  checks  the  definitions  for  validity  and  Issues  error 
messages  when  errors  are  found.  The  Definition  program  features  conditional 
assembly  directives,  complex  expression  evaluation  and  a  cross  reference  table 
listing. 

The  Assembly  phase  Is  a  two-pass  program  that  builds  a  symbol  table. 
Issues  error  messages,  produces  an  easily  read  program  listing  and  s3niibol 
table,  and  generates  an  object  module.  The  Assembly  program  also  features 
conditional  assembly  directives,  complex  expression  evaluation,  and  a  cross 
reference  table  listing. 

Meta29M  supports  a  macro  facility.  Through  the  use  of  macros,  variable 
length  microwords  may  be  defined,  fields  may  be  broken  up  Into  non- contiguous 
bit  patterns,  and  single  mnemonics  may  be  used  to  represent  complex  overlayed 
Instruction  formats.  Conditional  assembly  statements  may  be  used  In 
conjunction  with  macros  to  Implement  multi-purpose  macros.  Macros  may  be 
recursive  and  may  be  redefined  at  any  point  In  the  program. 

There  are,  however,  some  serious  limitations  to  Meta29M  that  make  It 
Inappropriate  for  architectures  such  as  the  CFH  and  wide  microword 
architectures  In  general.  The  Meta29M  Definition  language  Is  really  nothing 
more  than  a  simple  macro  language  consisting  primarily  of  the  ”EQU"  and  "DEF" 
directives.  These  are  used  In  the  following  manner; 

ABAT:  EQU  H#50  ; Define  a  constant  ABAT  -  50  hex 

ADD:  DEF  H#5,ABAT,4VH#  ;Deflne  an  Instruction  mnemonic  ADD 

Note  that  all  mnemonics  are  globally  defined  -  that  Is  a  mnemonic  may  be 
used  In  only  one  context.  While  this  may  be  sufficient  for  microprocessors. 
It  Is  a  serious  limitation  In  wide  Instruction  word  architectures  idiere  It  Is 
not  uncommon  to  have  In  excess  of  1024  Instruction  bits  and  multiple  similar 
fields  for  similar  resource  control  (say  several  Identical  multipliers).  In 
these  situations  It  Is  convenient  to  have  Identical  mnemonics  for  each  similar 
resource  with  no  conflicts.  In  addition,  wlde-word  architectures  are 
typically  "fleld-orlented"  where  the  Instructions  are  logically  broken  Into 
fields  for  ease  of  programming.  Thus  an  ADD  may  be  accomplished  In  any  number 
of  ways  using  any  number  of  resources  (l.e.,  there  may  be  multiple  ADDs  In 
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multiple  fields).  A  simple  macrolng  scheme  Is  Inadeqtiate  to  this  task. 

MlcroASM  begins  with  the  concept  of  logical  fields.  Any  logical  field 
may  have  any  nuniber  or  level  of  subfields.  Mnemonics  defined  for  any  field 
(or  subfield)  are  local  to  that  field  and  may  therefore  be  used  In  any  number 
of  different  contexts  without  ambiguity.  Logical  fields  are  then  mapped  to 
the  actual  physical  fields  of  the  microword.  This  may  be  as  simple  as  a 
direct  one-to-one  relationship  or  a  complex  relationship  Involving  any  number 
of  logical  fields.  It  should  be  noted  that  Meta29M  also  supports  complex 
expressions  for  mnemonic  definitions •  however  these  expressions  are  limited  In 
that  they  cannot  use  parentheses,  cannot  directly  reference  a  "field"  and 
support  a  very  limited  set  operators.  MlcroASM  supports  the  complete  set  of 
ANSI  C  language  operators  (arithmetic,  logical  and  bitwise)  with  the  addition 
of  three  MlcroASM  specific  operators  (EITHER,  CAT  and  PARITY) . 

One  of  the  more  serious  limitations  of  Meta29M  Is  that  It  does  not 
support  polyphase  system  clocks,  which  are  Increasingly  common  In 
multiprocessor  parallel  architectures.  Specifically  the  CPH  uses  a  two-phase 
system  clock,  and  thus  cannot  make  use  of  an  assembler  like  Meta29M. 

MlcroASM’ 8  definition  stage  actually  defines  the  fields  in  the  microword 
and  constructs  all  of  the  necessary  symbol  tables  for  the  Assembly  phase. 
This  allows  the  Assendily  phase  to  execute  far  quicker  than  a  system  i^ere  the 
symbol  tables  must  be  constructed  at  run  time.  Also,  MlcroASM  uses  a  macro 
preprocessor  idilch  allows  conditional  assembly  as  well  as  coiiq>lete  macro 
capabilities.  Another  capability  provided  by  the  MlcroASM  preprocessor  and 
not  supported  by  Meta29M  is  the  ability  to  "include"  other  source  files  at 
assembly  time.  This  allows  the  user  much  greater  flexibility  In  source  file 
control  -  i.e.,  all  constants  may  be  placed  In  a  single  "include"  file  and 
used  with  any  nximber  of  other  source  files. 

Another  Important  feature  not  supported  by  Meta29M  Is  the  automatic 
support  of  different  number  formats.  Nhlle  both  Meta29M  and  MlcroASM  allow 
the  specification  of  numbers  In  Binary,  Octal,  Decimal  and  Hexadecimal, 
MlcroASM  also  allows  the  specification  of  floating-point  numbers  in  IEEE 
single  and  double  precision  as  well  as  DEC  F  and  DEC  G  formats.  In  addition, 
MlcroASM  supports  a  Pragma  to  specify  whether  numbers  are  big  endian  or  little 
endian  (see  Section  4.6  of  this  report).  Another  Important  feature  support  by 
MlcroASM  alone  Is  the  "PARITY"  field  operator  whereby  any  physical  field  may 
be  mapped  as  the  parity  of  any  condilnation  of  logical  fields.  This  Is 
Increasingly  Important  for  the  efficient  programming  of  fault  tolerant 
architectures.  Specifically,  the  CPH  uses  parity  for  memory  checking,  thus 
this  feature  Is  Important. 

Finally  the  level  of  error  checking  that  Is  possible  with  MlcroASM  Is  a 
significant  Improvement  over  Meta29M  which  can  only  check  to  see  that  the 
final  value  of  the  microword  Is  the  proper  length  and  that  the  Internal 
Meta29M  syntax  has  not  been  violated.  MlcroASM  can  detect  fields  that  are 
referenced  in  the  wrong  phase,  or  for  the  wrong  number  of  phases.  It  can 
enforce  specific  latency  times  for  different  fields  or  mnemonics.  It  allows 
the  definition  of  default  values  at  any  level  and  even  warns  the  user  of 
suspicious  activity  (I.e.,  using  a  decimal  number  to  define  a  mnemonic  -  not 
Illegal  but  certainly  uncommon) . 
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5.1  Ovarvlaw 


The  MleroASM  system  consists  of  two  programs.  GERASM  generates  the 
symbol  tables  specific  to  each  micro- architecture  that  Is  defined  using  the 
MleroASM  definition  language.  GERASM  compiles  this  language  and  populates  the 
symbol  tables .  MLCROASM  uses  the  symbol  tables  to  assemble  code  that  uses  the 
mnemonics  and  logical  fields  defined  and  compiled  by  GERASM. 

5.2  c—abm  Prograa  -  Definition,  of  merornord  Fields  and  Ifaamanlca 

The  central  concept  of  MleroASM  Is  the  Idea  of  Logical  Fields  and 
Physical  Fields.  Logical  fields  are  fields  defined  by  the  microprogrammer  and 
are  actually  referenced  In  the  micro-assembly  code  Itself.  Physical  fields 
represent  the  actual  physical  segments  of  the  microword.  The  definition  phase 
of  MleroASM  Involves  defining  the  Logical  fields,  subfields  and  mnemonics  that 
conceptually  describe  the  underlying  hardware  and  then  mapping  these  Logical 
fields  to  the  Physical  fields.  This  Is  done  by  using  the  MleroASM  definition 
language  \^lch  Is  compiled  by  the  GENASM  program  to  produce  the  tables 
required  by  the  MICROASM  micro-assembler  program. 

5.3  MleroASM  Doflnltlon  Language 

The  GEHASM  definition  language  Is  designed  as  a  structured,  block 
oriented  language  In  the  spirit  of  C.  In  fact  actual  C  syntax  Is  used  for 
some  definition  syntax.  This  laaguaga  la  eonplataly  position  Indapandant  and 
all  idilta  apace  la  Ignored  by  tba  eoapUar  thus  easily  readable  progranmlng 
"styles"  are  encouraged  but  not  enforced.  This  language  essentially  does  two 
things:  It  allows  the  definition  of  Logical  fields,  along  f^th  their 
associated  subfields  and  mnemonics  with  no  concern  as  to  the  "physical 
position"  of  the  fields,  and  them  allows  the  mapping  of  these  Logical  fields 
onto  the  actual  physical  microword. 

5.3.1  GEM ASM  Casa  Sensitivity 

GENASM  can  compile  the  definition  language  either  case  sensitive  using  a 
command  line  switch  (-c)  or  case  Insensitive  (default).  When  In  ease 
sensitive  node,  nothing  is  translated  and  all  ksynords  are  In  loner 
ease.  When  In  case  Insensitive  mode  all  characters  are  converted  to  lower 
case. 

5.3.2  CoMMnts 


Comments  In  GEHASM  (and  MICROASM)  are  delimited  exactly  the  same  as  they 
are  In  the  C  language:  Comments  begin  with  /*  and  end  with  *f.  Any  other 
character  sequences  Including  new  lines  or  carriage  returns  are  acceptable  as 
comsaents  within  the  delimiters  and  are  simply  Ignored  at  compile  or  assembly 
time.  Nesting  of  comments  Is  not  allowed. 

Example:  /*  This  Is  a  comment  */ 

I*  This  Is  also  a  comment  that 
ends  down  here.  */ 
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5.3.3  Bumrlcal  V«lusa 


Any  numerical  value  associated  with  the  MlcroASM  definition  phase  will 
always  be  an  unsigned  Integer.  The  definition  language  supports  four  coomon 
number  bases,  binary,  octal,  decimal  and  hexadecimal.  For  octal,  decimal  and 
hexadecimal  nvunbers  the  specification  Is  Identical  to  that  used  by  the  C 
language,  binary  nvunbers  are  specified  In  a  similar,  consistent  manner.  The 
syntax  for  specification  of  each  Is  as  follows: 

Binary:  Obb±n_num  ^ere  bin_num  Is  any  valid  binary  number  (l.e.,  each  digit 

must  be  either  a  0  or  a  1)  prefixed  by  Ob,  Example:  ObllOll 

Octal:  Ooct_num  where  octjaim  la  any  valid  octal  number  (l.e.,  each  digit  must 
be  between  0  and  7)  prefixed  by  0.  Example:  0642 

Daclmal:  dec_num  where  dec_num  Is  any  valid  decimal  number  (l.e.,  each  digit 

must  be  between  0  and  9)  NOT  prefixed  by  0.  Example:  642 

•  0xbex_ntm  where  bex_ntm  Is  any  valid  hexadecimal  number  (l.e., 
each  digit  must  be  between  0  and  9  or  between  A  and  F)  prefixed  by  Ox, 
Example:  0x642A 

5.3.4  Daflnltlon  of  Global  Paraaatara 

In  any  MlcroASM  definition  there  are  three  global  parameters:  width, 
phaaaa,  and  dafblt. 

width  specifies  the  actual  width  In  bits  of  the  physical  microword  using  the 
following  syntax: 

width  -  nuffl 

where  num  Is  an  Integer  (between  1  and  2  )  In  any  of  the  acceptable  number 

bases.  Failure  to  specify  microword  width  results  In  an  error. 

dofbit  specifies  the  default  bit  value  to  be  used  whenever  a  value  Is  not 
explicitly  specified  for  any  field.  The  specification  synt€tx  Is  as  follows: 

dofbit  •  num 

where  num  Is  either  0  or  1  In  any  of  the  acceptable  number  bases,  dofbit  Is 
optional,  but  there  Is  NO  DEFAULT  VALUE.  Thus  If  dofbit  Is  omitted,  any 
unspecified  bits  In  the  assembly  phase  will  generate  an  error.  To  aid  In 
program  debugging,  a  warning  Is  generated  each  time  the  global  dofbit  value  Is 
used  automatically. 

5.3.5  Logleol  ?iold  Doflnltlon 

A  logical  field  Is  a  segment  of  a  microword  that  may  be  named  to  reflect 
Its  nature  -  l.e.,  "ALU_1"  or  "SEQUENCER".  A  logical  field  may  have 

associated  with  It  mnemonics,  and  a  default  value  that  Is  Implied  whenever  the 
field  Is  active  but  no  value  Is  explicitly  assigned  to  It.  A  logical  field 
may  also  have  any  number  of  nested  subfiolds  -  each  with  their  own  mnemonics 
and  defaults.  In  addition  a  logical  field  (or  subfield)  may  be  defined  to  be 
"active"  for  a  specified  nvudier  of  clock  phases.  The  syntax  for  logical  field 
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definitions  Is  as  follows: 


f±eldname_l  [fld_w±dtb]  tact _pbaaeal 
f±eldname_2  [fld_width\  tact _pba8es2 


fieldname_n  [fld_w±dtb]  tact _pba8esn 

{ 

field  definition  -  aubflelds  and  amemonlca 

} 

fleldnamea  are  valid  unique  Identifiers  (relative  to  their  parent  block) .  The 
syntax  of  multiple  fleldnamea  Is  used  to  specify  fields,  probably  mapped  to 
different  parts  of  the  physical  microword,  that  have  the  same  subfields  and 
mnemonics  without  having  to  duplicate  the  entire  field  definition.  fld_widtb 
Is  an  Integer  (between  1  and  2^^)  In  any  of  the  acceptable  number  bases 
delimited  by  square  braces  "["  and  Note  that  the  sum  of  the  widths  of 

all  children  fields  must  be  less  than  or  equal  to  the  width  of  the  parent 
field. 


The  syntax  for  logical  subfield  definitions  Is  Identical  to  parent  field 
definitions  -  l.e.,  all  field  definitions  are  Identical.  The  only  difference 
Is  that  subfields  are  defined  within  the  parent  field’s  definition  block. 

Note  that  the  logical  "position"  or  "offset"  within  the  parent  field  Is 
determined  by  the  order  In  which  the  subfield  Is  defined.  This  Is  Important 
In  that  when  mnemonics  are  specified  In  MICROASM  (the  assembly  phase)  the 
order  of  fields  referenced  are  determined  by  this  definition  order. 

5.3.6  Direct  Field  Definition 

In  the  case  where  It  Is  desired  to  define  a  block  (of  subfields  and 
mnemonics)  for  a  set  of  differing  subfields  of  different  fields,  the  MlcroASM 
Indirection  syntax  may  be  used.  This  syntax  Is  similar  to  the  C  "struct" 
reference  syntax. 

par entl. child l.cbildn  [fld_widtb]  tact _pbaaeal 
parent2.cblld2.cbildm  [fld_wldtb]  tact_pba8e82 


parentn.cblldy  [fld_^dtb]  tact _pbaaesn 

{ 

field  definition  -  aubflelds  and  mnemonica 

} 

where  cblldn  Is  referenced  as  a  child  subfield  of  cbildl  which  Is,  In  turn,  a 
child  subfield  of  parentl.  Parental  precedence  descends  from  right  to  left 
with  the  leftmost  field  specified  Is  the  global  field  level  parent  and  the 
rightmost  field  being  the  new  subfield  to  be  defined,  with  each  field  naiw 
separated  by  a  period  ".".  fldjaidtb  is  an  Integer  (between  1  and  2^^)  In 
any  of  the  acceptable  number  bases  delimited  by  square  braces  "  [ "  and  "  ] " . 
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Note  that  the  sum  of  the  widths  of  all  children  fields  must  be  less  than  or 
equal  to  the  width  of  the  parent  field.  The  act jphaaea  specifiers  are 
optional  and  specify  the  number  of  phases  during  which  the  associated  subfield 
must  be  "active"  or  hold  a  value.  If  act _pbaaea  Is  specified  then  all  of  the 
subfield’s  children  (subfields  and  mnemonics)  will  be  assumed  to  be  active  for 
act _pbaaea  as  well.  If  act _phaaea  Is  not  specified  then  each  child  (or  block 
of  children)  may  be  specified  with  differing  active  phase  specifiers, 
act _pbaaea  Is  preceded  by  "#".  The  field  definition  can  Include  subfield 
definitions,  mnemonic  definitions  and  default  values  with  the  entire 
definition  block  delimited  with  braces  "{"  and  "}". 

5.3.7  IfaiaaaDlc  Deflnltlona 

A  mnemonic  Is  similar  to  a  macro  In  that  It  serves  to  substitute  a 
numeric  value  for  an  Identifier  name.  In  MICROASM  It  differs  from  a  macro  In 
that  mnemonics  are  always  local  to  their  block  (parent  field) ,  and  serve  to 
define  a  FINITE  SET  of  Identifier-referenced  values  for  the  parent  field.  In 
other  words.  If  a  set  of  mnemonics  Is  defined  for  a  field  (this  Includes 
global  mnemonics  or  parent  block  mnemonics),  then  no  other  mnemonics  will  be 
allowed  to  be  used  In  reference  to  that  field. 

5.3.8  Defining  Fields  to  Accept  Address  Lsbels 

Some  fields  may  need  to  accept  address  labels  as  well  as  mnemonics.  These 
labels  are  defined  during  the  assembly  phase  In  the  micro  assembly  code 
Itself.  These  types  of  logical  fields  usually  refer  to  address  sequencers  or 
program  counters.  The  syntax  for  defining  a  field  that  accepts  labels  is: 

field  def 

{ 

labela 


} 

The  labela  keyword  may  be  Included  with  mnemonic  definitions  In  a  field 
definition.  The  labela  keyword  may  appear  any\diere  that  a  mnemonic  definition 
can  with  the  exception  of  global  mnemonics.  In  other  words,  the  global 
microword  may  NOT  accept  labels.  In  addition,  the  field  for  which  labels  have 
been  specified  must  be  of  the  proper  size  (as  with  any  mnemonic  definition). 

5.3.9  Conplat*  Field  aad  Ikwannle  Definition  Bxnple 

The  following  Is  an  example  to  Illustrate  the  use  of  the  GBNASM  definition 
language . 

/ REHASH  Definition  example  *1 

width  -  64  /*  64-blt  wide  microword  */ 

phases  “4  /*  4-pha8e  system  clock  */ 

defblt  “0  /*  When  in  doubt  assign  a  0  */ 

I*  Logical  Field  Definitions  */ 
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/*  Global  level  field  22  bits  wide  called  multi  */ 


multi  [22] 

{ 

default  “  0x0000 
xsel[3]  #1 
ysel[3]  #1 
{ 

cacbe__a  “  ObOOO 
cache_b  "  ObOlO 
alu_l  -  Obi 10 

} 

insta[8] 

{ 

mult  -  Obi 11 1000  #2 
div  -  ObOOllOOO  #A 
ins_flag[l]  #1 
{ 

real  *  ObO 
imag  ~  Obi 

} 

rest [7] 

{ 

tia  -  ObOOOlllO 
tib  -  OblllOOlO 

} 

} 

instb[8]  #2 

{ 

clear  •  Obf'OOOOOO 
load  -  Oblllllll 

} 

check  [ 2 ] 

ready  “  ObOl 
set  -  OblO 
go  -  Obll 

} 

} 


/*  Default  value  for  multi  */ 
/*  Subfields  of  multi  */ 


/*  Mnemonics  for  xsel  &  ysel  */ 


/*  Subfield  starting  at  bit  6  */ 

/*  Active  for  2  phases  */ 

I*  Active  for  A  phases  */ 

/*  Single  bit  subfield  of  insta  */ 


I*  Subfield  using  2  bite  ♦/ 


alu_l  [5] 

{ 

default  ■■  Obi  11 11 
cont  [3]  #2 
{ 

on  -  Oblll 
off  -  ObOOO 

} 

check  [ 2 ] 

{ 

go  -  Obll 
clear  ••  ObOO 

) 


/*  Field  5  bits  wide  called  alu_l  */ 

/*  Three  bit  subfield  */ 

I*  Local  mnemonics  for  subfield  *1 
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5.3.10  8p«el£le«tlon  of  Logical  Field  to  Physical  Flald  Mapping 


Once  the  logical  fields  are  defined  they  must  then  be  mapped  onto  the 
actual  physical  microword.  Unlike  logical  fields  which  may  have  as  many  bits 
as  Is  conceptually  expedient,  physical  fields  are  constrained  by  the  actual 
hardware  for  \dilch  the  MlcroASM  tables  are  being  defined. 

5.3.11  Aaalgnlng  Logical  Flalds  to  Physical  Fields 

The  assignment  of  logical  fields  to  physical  fields  Is  done  using  the 
aaelga  statement.  These  statements  have  the  following  syntax: 

assign  {offset_spec)  Q  (pbase_epec)  -  £ield_speci 

The  assign  keyword  Is  followed  by  the  offset_spec  which  defines  the 
absolute  position  within  the  physical  mlcroword  that  the  physical  field 
occupies.  offset  spec  can  take  any  combination  of  the  two  distinct  offset 
forms  -  contiguous  form  and  Individual  form  -  delimited  by  parenthesis  "{"  and 
")".  pbase_apec  uses  a  syntax  Identical  to  the  offset_spec  to  specify 
absolutely  \dilch  phases  the  physical  field  may  bacona  active  In.  The 
pbaee_spec  Is  always  preceded  by  an  The  fleld_spec  is  a  logical  field, 

list  of  logical  subfields,  or  bitwise  logical/arithmetic  expression  with 
logical  fields  as  operands.  The  entire  expression  Is  always  followed  by  a 
semicolon  The  semicolon  syntax  for  "end  of  statement"  Is  Included  since 

In  many  cases  these  assign  statements  will  occupy  multiple  lines  and  the  "end 
of  statement"  Is  easier  and  more  compact  than  "line  contlnviatlon"  schemes. 

5.3.12  Absolute  Fhaaa  Specif lore  (not  Implemented  yet) 

Absolute  phase  specifiers  determine  the  phases  during  which  a  physical 
field  may  become  active.  This  allows  the  definition  of  physical  fields  that 
control  completely  different  hardware  functions  In  different  clock  phases,  or 
the  definition  of  fields  that  can  alternately  carry  instructions  and  Immediate 
data  In  different  phases.  Absolute  phase  specifiers  for  physical  fields  can 
take  two  forms.  The  syntax  for  both  forms  Is  as  follows: 

Contiguous  form:  {first’. last) 

where  first  la  the  first  phase  during  which  the  physical  field  may  become 
active  and  last  Is  the  last  phase  during  which  the  physical  field  may  become 
active. 

Individual  form:  {phase ltpbase2,pbaeen) 

\^ere  pbasel  through  pbasen  are  Individual  absolute  phases  during  which  the 
physical  field  may  become  active. 

A  valid  absolute  phase  specifier  may  include  combinations  of  both  forms 
as  In  the  following:  {phase!, flrstt last fpbaae2) 

Note  that  the  combination  of  phase  length  specifiers  from  the  logical 
field  definitions  and  these  absolute  phase  specifiers  can  easily  cause  timing 
clashes  ^Ich  cannot  be  effectively  prevented  or  detected  by  the  compiler. 
Many  polyphase  machines  have  such  complex  timing  schemes  that  there  Is  no  way 
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to  automatically  distinguish  a  timing  mistake  from  a  complicated  system  - 
other  than  one  works  and  one  doesn’t. 

Note  also  that  6ENASM  always  SORTS  and  COMPRESSES  any  phase  or  offset 
specification.  Thus  (0,5, 4:1)  is  converted  to  (0:5).  While,  in  general,  this 
simply  promotes  rational  definition  It  can  lead  to  unexpected  results. 

5.3.13  field  Specif leetlona 

The  f±eld_spec  section  of  the  assign  syntax  may  be  as  simple  as  a  single 
logical  or  as  complex  as  a  complete  logical  expression  with  any  number  of 
logical  fields  as  operands .  These  expressions  are  Important  for  horizontal 
compaction  of  microwords  tdiere  sln.^le  physical  fields  must  be  used  In  multiple 
contexts  to  conserve  microword  width.  The  operators  allowed  In  field 
expressions  are  Identical  In  syntax  to  the  bitwise  operators  In  C,  with  three 
additional  operators.  These  are: 

fc  -  bitwise  AND  operator 
I  -  bitwise  OR  operator 
“  -  bitwise  XOR  operator 
I  -  bitwise  negation  (NOT) 

The  additional  operators  are: 

cat  -  Concatenation  operator 

althar  -  Allows  physical  field  to  he  referenced  by  one  of  two  logical  fields 

but  not  both  simultaneously. 

parity ()  -  parity  of  some  field  spec. 

When  the  GENASM  compiler  encounters  a  field  expression  It  stores  the 
expression  in  a  table.  I^e  expression  Is  evaluated  at  runtime  by  MICROASM 
whenever  the  pertinent  fields  are  referenced.  Any  logical  field  may  be 
Involved  In  any  number  of  expressions  as  long  as  there  are  no  obvious 
conflicts,  however  care  should  be  taken  when  using  logical  fields  In  multiple 
expressions  as  undetectable  clashes  are  possible. 

5.3.14  Aaslgnlng  Logical  flalds  to  Fhyalcal  Flalda  Bsaapla 

The  following  uses  the  fields  defined  as  an  example  of  how  the  aaslgn 
syntax  Is  used. 

I*  Physical  field  assignments  */ 

I*  Assign  multi  fields  except  Instb  to  absolute  *1 
I*  bits  0-15  becoming  active  In  phase  0  or  1.  */ 

assign  (0:15)  @(0:1)  multi. xsel,y8el,ln8ta{ 

I*  Assign  multi  fields  except  Insta  to  absolute  *1 
I*  bits  0-15  becoming  active  In  phase  2  or  3.  */ 

assign  (0:15)  @(2,3)  -  multi .xsel,ysel, Instb; 

/*  Abs  bits  16-18  -  bit  subfield  xsel  of  *1 

I*  sniltl  ANDed  with  subfiald  cont  of  alu_l.  *i 
assign  (16:18)  @(0)  »  multi. xsel  &  alu  l.cont; 
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I*  Abs  bits  20  and  22  are  either  multi. check  *1 
/*  or  al«i_l. check  but  not  both.  */ 

assign  (20,22)  @(0)  ■■  multi. check  EITHER  alu_l. check; 

5.4  mCROASIf  Progras 

The  MICROASM  program  allows  the  user  to  write  programs  referencing  the 
logical  fields  and  mnemonics  as  defined  In  the  GENASM  program.  The  basic 
format  for  MICROASM  statements  Is  as  follows: 

@act_pbael  fld_8pecl  ml , (mil fml2 J ,m3 ,ma 


@act _pba8n  fld_speca  mal ,mn2, (mnll ,mnl2),tm3f 

Where  act _pbas  Is  the  phase  for  which  the  following  mnemonics  are 
applied.  It  Is  preceded  by  £ld_spec  Is  a  parent  field  specifier  and  may 

be  a  simple  as  global  field  name  ("multi")  or  It  may  be  a  direct  subfield 
reference  (multi .xsel) .  The  following  mnemonics  (mi,  ..  mn)  are  arranged  In 
the  order  that  their  parent  fields  were  defined.  When  a  parenthesis  Is  added 
this  Indicates  that  the  mnemonics  contained  within  the  parenthesis  belong  to  a 
child  field  of  the  current  level.  The  following  Illustrates  these  concepts: 

@0  multi  cacbe_a,cacbe_b, (real, t±a), set 

Mote  that  cacbe_a  Is  a  mnemonic  defined  for  multi. xsel,  cache_b  Is  a 
mnemonic  defined  for  multi. ysel,  real  Is  a  mnemonic  defined  for 
multi .lnsta.ln8_f lag,  tia  Is  a  mnemonic  defined  for  multl.lnsta.rest,  and  set 
Is  defined  for  multi. check. 

An  alternative  structure  Is: 

§act _pbaan  fld_specn  -  am; 

where  the  Implies  that  mnemonic  mn  belongs  directly  to  fld_apecn. 

5.4.1  Bafarsnesa  to  lanadlata  Data  Valuaa 

Since  a  mnemonic  Is  actually  an  Identifier  associated  with  an  actual 
numeric  value,  any  mnemonic  can  be  replaced  by  an  actual  numeric  value 
(assuming  the  field  referenced  Is  large  enough) .  In  addition  to  the  Integer 
nximber  base  specifications,  MICROASM  accepts  floating-point  data  that  Is 
automatically  converted  to  the  floating-point  format  specified.  The  following 
format  syntax  Is  supported: 

Osfp_nuffl  -  Single  precision  (32-blt)  IEEE  floating-point 
Odfp_nuffl  -  Double  precision  (64-blt)  IEEE  floating-point 
0ffp_nim  -  DEC  P  Single  precision  (32-bit)  floating-point 
0g£p_num  -  DEC  G  Double  precision  (64-bit)  floating-point 

Exantplet  0dl56.4632e4  would  be  represented  in  the  microcode  as  a  double 
precision  IEEE  format  nuiid>er. 
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This  syntax  allows  the  use  of  various  floating-point  formats 
unambiguously  in  the  same  microprogram. 

5.4.2  Label* 

Labels  are  used  to  mark  positions  within  the  microprogram  for  sequencer 
jumps  and  program  branches.  The  syntax  is: 

label : 


Where  label  is  an  unambiguous  identifier  to  be  associated  with  the 
address  of  its  occurrence  in  the  microprogram,  followed  by  a  colon 
Labels  may  be  referenced  in  the  same  way  as  mnemonics  in  fields  which  have 
been  defined  to  accept  labels.  Use  of  a  label  in  reference  to  a  field  which 
has  not  been  defined  as  accepting  labels  will  generate  an  error. 

5.4.3  Absoluts  and  Kslstlva  Addressing 

There  are  several  methods  of  programming  program  jumps  and  br'^ches  - 
absolute  addressing  and  relative  addressing,  /hsolute  addressing  simply  jumps 
to  the  address  (l.e.,  label  reference)  specified.  Relative  addressing, 
however,  calculates  the  offset  from  the  current  position  to  the  address 
specified  and  this  offset  is  the  value  stored  in  the  microcode.  Note  that 
offsets  can  be  negative  for  backward  jumps.  The  syntax  used  for  absolute 
addressing  is: 

addr_epec 

where  addr_apec  is  a  label  reference  or  an  immediate  value. 

The  syntax  for  relative  addressing  Is: 

[addr_epec] 

where  addr_spec  is  a  label  reference  or  an  immediate  value. 

Following  example  Illustrates: 


start: 

@0  aniltl  cache_a,cache_b,mult,go 


/*  Loop  to  start  by  jumping  to  start’s  address*/ 
seq  long_jmp, start; 

@0  multi  cache_a,cache_b,mult,go 


/*  Loop  to  start  by  adding  the  offset  of  difference  between  the  */ 
I*  current  location  and  start's  address  to  the  sequencer.  */ 

seq  8hort_jo|>,  [start] ; 
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5.4.4  Bitprasaloiu 

Any  MICROASM  statement  may  contain  arithmetic  or  Boolean  expressions 
that  follow  the  same  operator  precedence  and  construction  rules  as  C.  There 
Is  no  limit  on  the  complexity  or  nesting  of  the  operations.  Obviously  there 
Is  a  limit  on  the  size  of  the  result.  Any  result  that  overflows  the  size 
defined  for  It  will  generate  an  error.  The  following  operators,  listed  In 
descending  order  of  precedence  are  supported: 

I  Boolean  bitwise  negation  (NOT) 

*  Arithmetic  multiplication 

/  Arithmetic  division 

Z  Arithmetic  remainder  (modulus) 

+  Arithmetic  addition 

-  Arithmetic  subtraction 

&  Boolean  bitwise  AND 

^  Boolean  bitwise  Exclusive  OR  (XOR) 

I  Boolean  bitwise  OR 

5.5  MleraASM 

MlcroASM  uses  a  C  type  preprocessor  to  implement  macros  and  conditional 
assembly.  This  is  a  text  processor  that  manipulates  the  text  of  a  «ource  file 
as  the  first  stage  of  assembly.  Although  MICROASM  ordinarily  Invokes  the 
preprocessor  in  its  first  pass,  the  preprocessor  can  also  be  invoked  as  a 
stand-alone  program. 

5.5.1  Prmprocassor  Dlractlvea 

The  MlcroASM  preprocessor  recognizes  the  following  directives: 

#da£laa 

#unda£ 

#1£ 

#l£da£ 

#t£ndaf 

#all£ 

#alaa 

#andl£ 

ftocluda 

#pragma 

The  pound  sign  "#"  must  be  the  first  non-white-space  character  on  the 
line  containing  the  directive.  Several  of  these  directives  require  an 
argument  or  value.  Any  text  that  follows  c.  directive  that  is  not  part  of  its 
argument  or  value  must  be  enclosed  in  comment  delxmlters  "/*"  and 

5.5.2  Conatants  and  Maeroa 

The  #da£liia  directive  is  used  to  create  constants  and  macros.  Its 
syntax  is: 

#da£lna  mac  name  subat  text 
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#d«£lxw  substitutes  subat_text  for  all  subsequent  occurrences  of  mac_name  that 
can  be  Interpreted  as  tokens  that  are  encountered  In  the  source  text .  In 
other  words  aiac_naiDe  is  replaced  by  subet_text  vdierever  it  is  encountered  in 
the  text  following  the  #ds£lna  directive  unless  It  is  enclosed  in  parenthesis 
or  is  part  of  a  longer  identifier.  The  following  example  illustrates: 

/*  Original  Source  Code  */ 

fdefine  PI  0s3.14159 


@0  alu  add,cachea,cacheb,PI 

/*  Source  Code  after  Preprocessing  */ 


@0  alu  add, cacheatcacheb,0s3. 14159 

5.5.3  1hida£l2iiiig  Maeroa  or  Conatanta 

The  #iiiida£  directive  removes  the  definition  of  an  identifier.  Once  the 
definition  is  removed  it  can  be  redefined  to  a  different  value.  This  allow 
the  use  of  the  same  macro  or  constant  name  to  be  used  x^ith  different  values  in 
different  contexts  in  the  same  source  code.  The  syntax  is: 

#uxida£  mac_name 

This  syntax  will  remove  the  previous  definition  of  mac_aanie  which  was 
defined  using  a  #da£ljia  statement.  The  lundef  directive  is  usually  paired 
with  a  #da£l]ia  directive  to  Implement  conditional  or  special  case  assembly. 

5.5.4  Include  FUaa 

The  #lncluda  directive  Inserts  the  contents  of  the  specified  file  into 
the  source  file  at  the  point  where  the  #lneluda  reference  occurs.  This  allows 
the  organization  of  common  constants  and  macros  into  "Include  files"  which  may 
be  #lneludad  into  any  ntmber  of  MlcroASM  source  files.  There  is  a  "standard" 
Include  file  called  "std.lnc"  that  comes  predefined  with  MlcroASM.  This  file 
contains  commonly  used  constants  #d«£ln^  in  all  of  the  different  nuiid>er 
formats. 

Another  Important  use  of  Include  files  Involves  Including  source  modules 
into  a  main  driver  module.  This  allows  the  use  of  smaller  easily  manageable 
source  files  which  can  all  be  Included  into  a  larger  program. 

The  syntax  is: 

#laeluda  ”file_apac” 

or 

#liielud*  <file_8peo 
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These  two  forms  differ  in  the  path  search  initiated  by  the  preprocessor 
for  the  file  specified  by  f±le_spec  if  flle_8pec  does  not  include  a  coiiq>lete 
path.  The  first  form  which  uses  double  quote  delimiters  searches  the  parent 
source  file’s  directory  first  and  then  searches  the  "standard  directories"  as 
defined  via  command  line  or  system  setup.  The  second  form  which  uses  the 
bracket  delimiters  "<"  and  ">"  begins  it’s  search  with  the  standard 
directories . 

Include  files  can  be  nested,  i.e.,  and  Include  file  may  Itself  contain 
#include  directives.  When  Include  files  are  nested  directory  searching  begins 
with  the  directories  of  the  parent  and  then  proceeds  through  the  directories 
of  any  grandparents  and  finally  it  searches  the  standard  directories. 

5.5.5  Conditional  laaonblj 

One  of  the  most  powerful  features  of  the  MlcroASM  preprocessor  is 
conditional  assembly.  This  allows  the  use  of  a  single  source  file  for  several 
different  applications  (i.e.,  a  single  routine  source  may  be  assembled  into 
two  versions,  one  using  IEEE  floating-point  and  the  other  using  DEC  floating¬ 
point  by  simply  changing  a  single  statement).  The  basic  directives  that 
Implement  this  feature  are: 

#l£ 

#oll£ 

#olsa 

#andl£ 


In  addition  the  da£lnod()  operator  is  used  along  with  the  shortened 
concatenated  forms 

#l£d«£ 

#l£nd«£ 

The  syntax  is: 

#l£  conat_expr 
prog__text 
#«11£  conBt_expr 
prog_text 


#*ll£  coD8t_expr 
prog  text 

progjtext 

#«idl£ 

Each  #1£  directive  must  be  matched  by  a  closing  #endlf  directive.  Any 
number  of  #*ll£  directives  can  appear  between  the  #l£  and  #aiidl£,  but  at  most 
one  #*ls«  directive  is  allowed.  The  #•!••  directive  must  be  the  last 
directive  prior  to  #andl£.  The  preprocessor  selects  only  one  of  the  blocks  of 
prog_text  irtiich  can  be  any  sequence  of  text  occupying  any  number  of  lines. 
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Typically  prog_text  Is  MICROASM  source  code  or  preprocessor  directives.  If 
the  selected  prog_text  is  contains  preprocessor  directives,  the  preprocessor 
carries  them  out,  otherwise  prog_text  is  passed  to  the  asseiid>ler.  Any 
prog_text  not  selected  by  the  preprocessor  is  Ignored  and  thus  is  not 
assembled  or  processed. 

conetjBxpr  is  a  restricted  constant  expression  that  must  Involve  strictly 
constants  (which  may  be  fdaflaued)  and  daflnedO  values  that  resolve  to  an 
Integer  value.  The  preprocessor  selects  a  single  prog_text  block  by 
evaluating  the  conet_expr  restricted  constant  expression  following  each  #lf  or 
#«11£  directive  until  it  finds  a  non-zero  value.  It  the  selects  all  text  from 
the  #1£,  #«llf  or  #•!■•  directive  up  to  the  next  #allf,  #else  or  #endlf 
directive. 

The  deflnedO  operator  and  It’s  shortened  forms  Ilfdef  and  llfndef  use 
the  following  syntax: 

#lf  d«fl]ud(iDac_name) 
progjtext 

#«ll£  dafln«d(  aiac_naae) 
prog_text 


#«ll£  da£lxiad(  mac_name) 
prog^text 

#«ls« 

prog_text 

#endl£ 

or  alternatively 

#l£d«£  mac_name 
prog_text 

#•!!£  d«£liMd(  mac_name) 
prog_text 


#«ll£  d«£lnad(  mac_naiiie) 
prog_text 

progjtext 

#«idl£ 

These  conditional  blocks  operate  in  exactly  the  same  fashion  as  other 
Ilf  statements.  The  difference  Is  that  the  condition  Is  simply  whether 
macjaaae  has  been  previously  #d«£liMd.  The  other  forms,  Id«£liuid()  and 
#l£iid«£  are  satisfied  If  mac_pame  has  NOT  been  #da£liwd  and  are  used  In 
Identical  fashion. 
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5.5.6  Local  Acaanblor  Dlraetlvoa 

The  preprocessor  supports  a  method  of  embedding  assembler  directives 
Into  the  source  assembler  code.  This  Is  done  using  the  fpragna  directive. 
The  syntax  Is 

#pragna  d±rect_name 

Following  the  fpragma  directive,  dlrect_name  Is  a  single  Identifier 
Identifying  the  assembler  directive  to  be  active  beyond  that  point  In  the 
code.  At  this  time  the  only  direct_pameB  supported  by  HICROASM  are  the 
floating-point  byte/word  order  specifiers: 

LI'i'i'LBRMbT AH  Swaps  low  byte /high  byte 

BIOHDIAH  No  byte  or  word  swapping  Is  done 
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6.0  Coneluslona 


The  EVA  architecture  composed  of  the  VFH  and  the  CPH  subsystems  Is 
capable  of  glgaflop  throughput  for  several  reasons.  Careful  attention  was 
paid  to  the  Internal  buses  so  that  maximum  data  transfer  can  occur  among  the 
boards.  The  typical  board  level  10  bottleneck  was  reduced  significantly.  Use 
of  Gazelle  hot  rod  chips  with  glgaflop  clock  rates  and  the  fully  parallel 
crossbar  chip  made  all  the  difference. 

Several  Innovations  were  achieved  In  this  SBIR  Phase  11.  Among  those 
Include  the  crossbar  device  with  unparalleled  speeds.  The  CPH  architecture  Is 
ultrafast  due  to  the  massively  parallel  Internal  datapath  options  (using  the 
crossbar).  The  VPH  Is  a  multi  wave  processing  architecture.  Fully  concurrent 
DSP  processing  Is  made  possible.  Use  of  novel  packaging  helped  to  reduce  data 
transfer  bottlenecks.  A  photograph  of  the  micromemory  modules  shown  next  In 
Figure  54  made  It  possible  to  Integrate  more  memory  on  the  cache  boards.  Many 
Interfaces  were  necessary  to  Interconnect  the  CPH  to  a  PC,  VME,  and  VPH.  A 
VME  buffer  board  was  designed  and  built  to  let  the  CPH  converse  with  the  VPH 
and  a  VME  bus.  It  Is  shown  In  the  next  photograph  (Figure  55).  Most 
Important  of  all  was  the  crossbar  chip  also  shown  In  an  accompanying 
photograph  (Figure  56).  The  crossbar  chip,  a  256  pin  PGA  ASIC  reduced  board 
space  by  eliminating  numerous  multiplexer  devices. 

6.1  VFH  ParfozBeaee  and  Danonstratlon 

It  was  predicted  at  the  end  of  the  Phase  I  project  that  the  VPH  would 
perform  a  Ik  complex  FFT  In  800  usee.  The  board  actually  executes  this  FFT  In 
600  usee.  This  Is  largely  due  to  careful  hardware  design  and  adroit 
programming  of  the  VPH  by  Larry  Hall  and  Steve  Sharp.  Programming  the  325s 
proved  to  be  a  challenge  because  the  available  application  library  fit  only 
one  device  and  not  multiple  devices.  Nevertheless,  once  the  wave  concept  was 
mastered  and  used  consistently,  programming  to  optimize  performance  became 
routine. 

Code  for  convolutions,  correlations,  and  coordinate  transformations  was 
completed  quickly.  Using  conventions  for  startup  and  terminating  DSPs  helped 
reduce  the  effort.  The  STARTUP  and  FINISH  routines  were  created  for  generic 
code  segments  so  that  they  could  be  used  over  and  over.  The  68020  also  proved 
to  be  advantageous  In  controlling  the  synchronization.  As  a  result,  all  of 
the  Phase  I  performance  predictions  were  exceeded  by  at  least  25Z.  Some  of 
the  code  performance  Is  tabulated  below. 


torlthm  (4  DSPs^ 


Execution  Time 


Ik  Complex  FFT 

604 

64  Point  Correlation 

40 

64  Point  Convolution 

42 

8x8  2D  FFT 

65 

16x16  2D  FFT 

270 

32x32  2D  FFT 

724 

Polar  to  Rectangular 

25 

Rectangular  to  Polar 

48 
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rigura  54.  jHeroMinry  Modula 


Ilsur*  5S.  8*rl*l  1/0  Board 
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Figure  56.  Croaeber  Device 


Note  that  the  2D  FFTs  are  fast  enough  for  real-time  frame  grabbers  and 
CRT  displays  where  30  frames  a  second  are  often  viewed  without  flicker. 
Hence,  there  is  a  real  possibility  that  the  TSI  tracker  and  the  Space  Tech  VPH 
board  can  track,  focus,  and  translate  in  real-time  instead  of  off-line. 

The  VPH,  depicted  in  the  following  photograph  (Figure  57),  was 
demonstrated  at  WSMR  in  the  Instrumentation  Development  Directorate  on  25 
August  1992.  The  VPH  was  interfaced  to  a  TSI  single  board  computer  inserted 
into  the  VPH  mainframe.  A  PC  was  used  as  a  terminal  for  the  VPH  and  the  TSI 
SBC  had  Its  own  terminal.  The  demo  consisted  of  transmission  of  data  between 
the  VPH  board  shown  next  In  Figure  58  and  the  SBC  In  either  direction, 
executing  digital  signal  processing  programs,  and  sending  results  to  the  PC 
terminal  and  the  SBC  terminal. 

Special  drivers  and  utilities  were  generated.  These  drivers  manipulated 
data  and  programs  from  the  PC  so  that  the  debugger  In  the  SBC  could  access 
them  and  display  results  on  the  SBC  monitor.  A  section  of  the  SBC  memory 
space  was  allocated  for  the  VPH  results  and  processed  data  was  sent  there. 
Likewise,  programs  were  downloaded  from  the  SBC  to  the  VPH  to  be  executed  by 
the  VPH.  This  demonstrated  that  the  SBC  could  serve  as  system  VHE  master  or 
controller.  This  also  demonstrated  that  the  VPH  could  be  a  VHE  slave  In  a 
generic  VME  system.  This  Is  Important  for  the  VPH  as  It  Is  also  Intended  to 
be  Interfaced  to  SUN  workstations.  An  Important  device  in  the  VPH  greatly 
facilitated  the  SBC /VPH  Interface,  namely,  the  MVME  6000  VME  Interface  chip 
from  Motorola. 
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The  DSP  programs  that  Space  Tech  created  for  the  demo  were  a  IKFFT  and  a 
correlation.  Both  programs  were  verified  as  functionally  correct  by  comparing 
results  from  Independently  generated  outputs  from  C  routines  foimd  In  the 
text.  Numerical  Recipes  for  C.  A  roundoff  utility,  a  compare  word  by  word, 
and  an  internal  16-blt  fixed  to  IEEE  floating-point  data  type  conversion  were 
used  to  test  the  results.  The  IKFFT  output  was  within  5  decimal  digits  of 
accuracy  in  all  but  5  data  points.  The  other  5  were  within  4  fractional 
decimal  digits.  The  correlation  results  were  within  the  same  range  of 
precision.  This  is  to  be  expected  in  both  cases  since  the  PC  has  a  16-blt 
internal  processor  and  the  VPH  has  a  32-bit  internal  processor.  The  execution 
times  for  these  and  other  DSP  routines  are  shown  in  listing  above. 

An  important  discovery  of  this  demo  is  the  need  to  map  and  translate 
memory  maps  across  the  several  domains.  Those  physical  domains  include  the 
EPROM  space,  register  space,  and  data  space  of  the  ZORANs,  the  data  and 
program  space  of  the  68020,  and  the  data  and  program  space  of  the  SBC.  Care 
must  be  exercised  %dien  translating  the  correct  hexadecimal  literal  values. 
Tables  are  included  in  earlier  sections  to  make  the  translations  for  the  VPH. 
It  took  Space  Tech  some  time  to  determine  that  space  for  the  SBC  and  the  MVME 
on  it  because  little  documentation  existed. 

The  demonstration  was  executed  by  inserting  the  TSI  single  board 
computer  board  into  the  VPH  chassis  as  depicted  in  Figure  59.  A  Packard  Bell 
PC  was  used  for  the  VPH  CRT  and  keyboard  while  the  SBC  had  its  own  terminal 
and  keyboard.  The  memory  mapping  described  in  the  previous  paragraph  is 
illuminated  in  this  figure  when  we  observe  that  the  address  space  of  the  SBC 
is  16-blts  while  that  of  the  VPH  is  32-blt8.  Hence,  address  modifier  bits  in 
the  MVME  6000  were  used  to  perform  much  of  the  translation  between  the  VPH 
memory  and  the  SBC  memory.  All  of  the  standard  VME  bus  control  signals  are 
available  on  the  VPH  backplane  and  all  were  used  in  the  demonstration. 
However,  only  the  frequently  used  control  signals  are  shown  in  the  figure. 

The  demonstration  also  consisted  of  exercising  one,  two,  and  four  ZORAN 
DSP  chips  separately  and  together.  Because  a  transparent  bus  arbitration  PAL 
and  scheme  was  designed  into  the  VPH,  it  was  relatively  simple  to  turn  single 
or  multiple  DSPs  on  and  off.  The  procedure  is  to  set  up  the  status  register 
in  the  VPH  by  the  68020  and  let  each  ZORAN  monitor  their  own  "start"  bit.  If 
the  bit  is  set,  the  respective  ZORAN  chip  would  initiate  execution. 
Otherwise,  it  is  suspended.  Likewise,  when  a  DSP  chip  has  completed  its 
current  wave,  it  sets  its  "done"  bit  and  stops.  The  68020  monitors  these  bits 
as  the  board  master.  It  is  also  possible  for  any  resource  on  or  off  the  VPH 
board  to  monitor  these  jits.  Hence,  the  SBC  can  scan  these  bits  as  they  are 
foiind  in  the  public  domain  of  the  VME  backplane.  This  feature  is  very  useful 
when  more  than  one  master  is  '•"'ercising  Vlffi  resources.  This  capability  will 
support  the  SBC  tracking  and  ti  VPH  processing  data  in  real-time.  The  intent 
of  this  design  is  to  enable  the  VPH  to  process  in  the  background  while  a  front 
end,  like  the  SBC,  is  acquiring  the  data.  The  "Status. ASM"  code  in  Appendix  B 
was  used  to  demonstrate  this  capability. 
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A  number  of  other  code  segments  were  demonstrated.  They  include 
CConv.asm,  RectZpol.asm,  FFT2d8.asm,  Fol2rect.a8m,  FFT2dl6.asm,  FFT2d32.asm, 
Ccorr.asm,  FFTlk.asm,  Reclp.asm,  and  Rconv.asm.  All  of  these  routines  are 
found  In  Appendix  B.  Others  are  Included  there  and  are  useful  for  performing 
diagnostics  on  the  VPH.  Furthermore,  they  can  be  used  to  help  understand 
coding  the  DSPs.  Specifically,  "Testl.asm"  and  "Test2.asm"  are  useful  for 
diagnosing  the  ZORAN  chips  and  their  interrupts,  respectively. 

For  the  IKFFT,  a  random  set  of  input  points  were  chosen  rather  than  a 
known  set  of  points.  In  this  way  the  DSPs  were  demonstrated  as  to  accuracy 
and  precision  without  any  bias  towards  a  known  solution  or  output.  The 
results  of  the  FFT  were  then  compared  with  those  using  the  same  input  values 
to  a  standard  C  routine.  The  correlation  program  input  used  two  signals.  One 
was  a  square  wave  followed  by  a  triangular  wave.  The  other  was  an  impulse 
function.  The  output  of  the  correlator  worked  as  expected.  To  verify  our 
intuitive  conclusions,  it  was  necessary  to  zero  pad  the  front  end  of  the  input 
data  stream  so  that  aliasing  would  not  corrupt  the  interpretation. 

6.2  CPH  Conclualona 

The  CPH  design  effort  was  constantly  buffeted  by  the  technology 
envelope.  An  aggressiveness  design  stance  was  chosen  at  first  to  capture  any 
and  all  new  devices  or  promised  devices.  Among  those  included  FPGAs  and  ASICs 
with  performance  specifications  xmtrled  by  designers.  When  the  point  of  no 
return  for  fixing  the  design  of  the  CPH  came,  some  of  the  critical  devices  did 
not  live  up  to  advanced  performance  specifications.  As  a  result  the  CPH 
design  underwent  more  iterations  than  anticipated.  The  only  conclusion  to  be 
drawn  is  that  designers  should  not  push  the  technology  envelope. 

Unexpectedly,  available  devices  became  unavailable  as  manufacturers 
became  cost  sensitized.  Reduction  of  inventory  became  commonplace.  The  AMD 
29540  FFT  addreso  sequencer,  a  staple  for  any  FFT  designer,  was  removed  from 
AMD’s  catalogs.  A  work  around  required  over  40  16-pin  chips.  The  large 
number  and  size  could  not  be  supported  by  the  available  board  space. 
Subsequently,  the  address  generator  board  would  have  to  produce  FFT  addresses 
in  microcode.  The  VPH  then  became  an  important  board  to  the  EVA  machine 
because  it  was  capable  of  very  fast  FFTs. 
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7.0  fov  Phasa  111 

The  CPH  should  be  completed  and  fabricated.  In  doing  so,  the  following 
Is  recommended.  Clock  distribution  on  the  backplane  could  be  done  with  a 
single  TTL  clock  which  will  have  low  skew  from  board  to  board.  Then,  each 
board  could  have  a  digital  delay  line  to  adjust  for  the  skew.  Use  a  double 
rate  clock  so  each  board  will  develop  180  degree  clocks  for  the  quad.  A 
single  stepper  on  the  backplane  would  be  useful.  Install  a  switch  or  jumper 
to  make  the  selection  between  RUN  and  Single  Step  modes.  Then  debounce  the 
switch  manually  or  use  a  trigger  so  single  stepping  can  be  done  from  some 
external  Interface. 

7.1  Baekplaxui  Dmalgn 

The  current  EVA  chassis  has  a  9U  VHE  backplane  and  a  custom  backplane 
for  the  CPH  side.  The  custom  backplane  Is  not  complete.  In  the  next 
development  phase,  this  effort  will  require  the  selection  of  system  clock 
circuitry.  That  circuitry  may  be  distributed  physically  across  this 
backplane.  It  will  have  to  be,  especially  If  ECL  clocks  are  used.  Six 
differential  ECL  clocks  should  be  used  with  the  same  clock  timings  as  shown 
for  the  cache  memory  board  section.  A  provision  for  single  stepping  the 
clocks  should  also  be  provided  for  diagnosing  system  faults.  A  status  bit  on 
the  PC-INT  board  may  also  be  used  here  to  control  single  stepping.  Another 
desirable  option  would  be  to  freeze  the  clock  at  40  MHz  and  return  It  to  the 
state  just  before  the  freeze.  Obviously  this  will  only  be  useful  If  entirely 
glitch  free  operation  can  occur.  Finally,  terminations  need  to  be  designed 
Into  the  clock  circuitry  at  the  end  of  the  lines  so  that  overshoot  Is 
suppressed. 

7.2  Intagratlon  of  th*  EVA  Conputor 

The  VPH  can  be  a  standalone  board  or  hosted  via  Its  VME  bus  directly  to 
a  SUN  workstation  or  Indirectly  to  a  60  VME  system  with  a  single  board 
computer.  Integration  Involves  more  than  hardware,  however.  The  system 
software  of  the  host  must  be  modified  to  make  calls  to  the  VPH,  upload  and 
download  code  and  data,  and  manage  the  throughput  of  the  VPH.  Because  VPH  is 
so  fast,  the  VME  bus  Is  not  the  best  choice.  Another  high  speed  bus  can  be 
used  If  the  host  has  the  port.  The  VPH  to  CPH  bus  via  the  SIO  channel  uses 
the  Gazelle  hot  rod  chip  set  with  gigabyte  transfer  rates.  A  future  effort 
could  examine  the  Implementation  of  this  path  to  a  host. 

If  EVA  Is  to  be  Integrated  fully  to  the  TSI  tracker,  a  good  approach 
would  be  to  put  the  TSI  6U  VME  backplane  Into  the  EVA  chassis.  This  will 
allow  the  VPH  to  plug  directly  Into  the  TSI  backplane  in  one  chassis  and  speed 
up  operations  further.  It  Is  a  simple  matter  to  mechanically  modify  the  EVA 
chassis.  Two  new  rails  are  necessary. 

7.3  Crossbar  Appllestlons 

The  crossbar  Is  an  ultra  fast  switcher.  It  Is  general  purpose  so  that 
any  digital  data  gateway  can  benefit  from  Its  dynamically  reconflgurable 
switching.  The  12x14  In  OUT  paths  are  changeable  in  one  single  clock  cycle. 
Ho  other  crossbar  can  do  this.  Also,  the  crossbar  Is  cascadable  so  that  each 
4-bit  slice  can  be  expanded  Into  any  wordlengtb  desired.  Telemetry  gateways 
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may  benefit  from  this  remarkably  fast  switcher.  Wide  area  telephone  net  works 
could  benefit  from  this  powerful  device.  WSMR  should  consider  a  Phase  III 
technology  transfer  with  this  chip  to  applications  Army-wide. 

7 >4  Caaemdabllltj 

Cascadablllty  can  he  supported  for  fixed-point  arithmetic.  Use  one 
block  of  memory  and  a  processor  board  for  the  lower  32  bits  of  the  64-blt 
nuadser.  Use  another  configuration  for  the  upper  32  bits.  Microcode  will 
have  to  be  very  sophisticated  because  the  BIT  chips  do  not  provide  all  the 
necessary  signals  and  flags.  Also,  the  CPH  throughput  will  fall  off 
drastically.  The  better  approach  would  be  to  use  the  64-blt  capability  of  the 
BIT  chip  dlrect..y. 

Cascadablllty  will  require  a  local  address  bus  so  that  the  local  CPH  can 
use  the  HSIO  bus  without  conflicting  with  the  other  CPH  HSIO  bus.  Currently, 
the  design  supports  an  HSIO  address  that  Is  broadcast  everywhere.  Additional 
hardware  will  be  needed. 

The  system  Initialization  bit  Is  on  the  cache  memory  board  In  10  space. 
This  bit  will  need  to  be  set  on  the  backplane  and  cleared  by  the  HSIO.  Hence, 
the  AG  has  to  be  the  principal  owner  of  this  bit.  Each  cache  bank  will  have 
minor  ownership.  The  lOP  will  need  to  monitor  this  bit  so  as  to  determine  the 
system  configuration  (where  multiple  CPHs  are  Installed) . 

7.5  KVA  Kztanalona 

The  EVA  architecture  will  prove  to  be  a  durable  concept  for  many  years. 
It  should  be  completed  to  the  extent  possible  by  the  new  technological 
advances.  Newer  FPGAs  and  ASICs  will  greatly  reduce  the  board  space.  Better 
transceivers  will  be  available  In  late  1992.  They  should  be  considered  for 
the  HSIO  bus.  Also,  since  the  BIT  3130  and  3120  ECL  ALUs  are  available,  a 
redesign  of  the  CPH  to  Include  these  80  MHz  devices  may  be  advisable  now  that 
the  system  Issues  of  EVA  are  formulated.  However,  selecting  ECL  ALUs  may 
eliminate  the  need  for  the  crossbars  or  modify  them  for  nonplpellned 
application.  Caution  Is  advised  In  choosing  3130s  etc.,  because  these  chips 
may  also  become  unavailable  In  the  future  possibly  being  overcome  In  superior 
performance  by  the  GaAs  devices. 

To  fully  support  EVA,  the  MlcroAsm  mlcroprograimlng  tool  should  Include 
a  linker  and  PROM  formatter  for  the  new  PROMs.  If  the  multiphase  clocks  for 
the  WCS  are  to  be  kept,  then  MlcroAsm  should  be  updated  to  support  multiphase 
microinstructions . 

7.6  lOP  Cooplatlan 

The  lOP  Is  a  general  purpose  10  traffic  controller.  The  design  can  be 
completed  by  adding  the  boot  state  machine  and  some  MUX  data  clocks 
(PCMUX,SI0MUX,HSI0MUX) .  Counters  A  and  B  enables  should  be  added  to  the 
schematic.  Also,  the  microsequencer  design  control  signals  need  to  fully  time 
analyzed  and  certified  for  race  and  hazard  free  operation.  This  Is  on  sheet 
10  of  the  lOP  schematic  set. 

To  download  microcode  from  the  lOP  to  the  processor,  use  the  HSIO  signal 
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lines  labeled  "DNLD  ENABLE" .  At  this  point  all  boards  should  be  in  the 
"available"  mode.  Two  new  signal  lines  should  be  designated  on  the  HSIO  bus. 
They  are  DNLD  REQT  (from  CPH  to  lOP)  and  DNLD  FINISH  (from  lOP  to  CPH) . 

To  upload  condition  codes  of  the  processor  board,  use  microcode  bits  to 
enable  same.  The  AG  will  read  the  error  FIFOs  on  the  processor.  Since  there 
are  many  flags  on  the  BIT  chips  (12),  a  48  to  1  mux  could  be  used  to  pass  1 
flag  only.  Another  flag  could  be  the  Interrupt  flag. 

7.7  Wava  Procaealog 

The  VPH  application  programs  have  been  heavily  optimized.  However,  there 
is  always  room  for  Improvement  especially  when  multiprocessing  occurs.  Some 
of  those  Improvements  were  noted  in  the  VPH  User's  Manual.  If  additional  VPH 
boards  are  inserted  into  EVA,  then  wave  processing  can  occur  over  8  or  16  DSP 
chips  with  an  attendant  Increase  in  performance.  New  code  can  then  take 
advantage  of  this  hardware  extension. 

7.8  VPH  Aiipnented  Bua 

The  VPH  communicates  with  the  CPH  through  the  extra  32  bits  in  the  VME 
space  via  an  augmented  bus.  In  this  manner  true  parallel  64-bit  transfers 
take  place.  For  the  SBC  interaction  across  the  32-blt  VME  bus,  this  augmented 
bus  is  not  needed.  Hence,  firmware  in  the  VPH  PALs  would  have  to  be 
regenerated  if  this  augmented  bus  were  to  be  activated.  The  PALs  must  allow 
for  redirecting  the  upper  half  of  normally  unused  memory  space  (for  the  SBC) 
back  to  the  CPH  address  space.  This  is  straightforward  and  a  simple  PAL 
reprogramming  is  necessary. 

7.9  PhjMa  III  Opportunltiaa 

The  VPH  stands  an  excellent  chance  of  technology  transfer  into  many 
digital  signal  processing  applications.  Chief  among  those  are  those  found  in 
biomedical  imaging  applications  and  seismic  signal  processing.  Both 
commercial  applications  need  ultrafast  FFTs.  Both  need  over  Ik  length  FFTs. 
Seismic  data  processing  requires  Ikxlk  2D  FFTs.  The  VPH  can  handle  very  large 
FFTs  but  it  might  be  better  to  add  additional  memory  to  the  board  first.  This 
will  reduce  the  off  board  data  traffic.  New  and  denser  memory  chips  are  now 
available  and  can  be  used  in  a  mezzanine  board  for  this  purpose. 

The  crossbar  Phase  III  opportiinltles  have  been  presented  already.  The 
device  Itself  should  find  many  practical  applications  outside  of  computing. 
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APPENDIX  A 


CPE  PROGRAMS 


O  SD  O  ^  Ot  cn  »  Ui  M  H*  O  to  OD  ui  ^  W  (O  M 


Data:  6/30/92 
Size;  261 


File:  MULM.ASH 

Last  Modlflad:  Thu  Fab  06  13:54:22  1992 


-  /‘MULM.ASM*/ 

>  PROGRAM  CODESE6  MSRAM 

-  ORG  0 

-  start:  SSEQ  ,C0RT 

-  SIMHADD  OxOOOi 

-  SADDRA  IMM,P0 
•  $ADDRB  ZMN<P0 
.  $CACHE  ,R0J^ 

-  $REGAR,,BR; 

-  SSEQ  ,COIIT 

SMI  AR,P(ms,BR,POMS,IMULT,£N.MS; 
SSEQ  ,COKT 
$MWR  AR,P0MS; 

-  PROGRAM  ENDS 


Data;  6/30/92 
Size:  573 


File:  B.-IMT.ASM 

Last  Modified:  Mon  Feb  10  04:06:44  1992 
BESOIT  IS  OUTPUT  VIA  lOR,  UUkST  SICHIFICAMT  BITS  OHLY 


1  -  /*  lORZADD.ASM 

2  -  R0»0 

3  -  R1«0 

4  -  R2»R1^R0 

5  >  IOR*R2 


6  *  Testa  the  ALUl  with  the  data  register  fl)e,  Adda  two  auebers  tram  reg  0  and  reg  I.CIcmts  reg  < 
UBsedlate  field,  then  adds  regO  to  regl.  puts  atis  la  reg2.  operands  are  rero.  result  should 
10  port  real  side  (least  significant  only).*/ 


0  and  1  with  data  froe 
be  zero,  result  appea 


PROGRAM  CODESBG  M3RAM 
ORQ  0 

START:  $8EQ  .CONT 
SRBG  CLEAR. 0X00, CLEAR  0X01; 
SSEQ  ,COHT$REG  ,,,,,0X00,0X01; 
SSEO  ,C0IIT; 


,CONT  SAl  RE6A,P0LS,RE6B,P0LS,,IADD,EN,LS; 
,CONT  $A1  ,,,,,. BOLD, LS$]teG  A1,0X02; 
,C0IIT$RB0  ,,,,,6X02; 


,CONT  $10R  raGA,P0LS; 
hH  ENDS 


I 
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File:  lORADD.ASM 
Last  Modified:  Mon  Feb _  10  .05 :  jgigg.jggL. 


-  PKOGFAM  CODESEG  HSRAM 

3  LOOP?  $SEQ  (COirPSIOB  A1  SIOl  Mi 

-  SSBQ  ,CONT; 

;  ms  :cO»t'‘$A1  IOR,POLS,IOI,POLS,.IM)D,EN,LS; 

-  SSEQ  ,C0NT; 

:  life  jcONT'SIOR  Al,ms; 

-  $sEg  LOOPpLi; 

-  PROGRAM  BND5 


I  H 
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VPH  PROGRAMS 


B-1 
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Date:  6/30/92 
Size:  4607 


File:  BiRCONV.ASH 

Last  Modified:  Tue  Jun  30  15:26:58  1992 
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/*VPH  code  for  convolution  of  a  real  sequence  of  up  to  64  points  witb 
another  longer  real  sequence,  producing  up  to  1024  outputs.  This 
size  can  be  done  vitb  a  single  FIR  instruction.  This  code  can  be 
called  repeatedly  on  a  single  processor  to  handle  convolutions  where 
■tore  than  1024  output  points  are  required  as  long  as  the  shorter 
sequence  is  still  less  than  64  points.  Bowever,  a  different  routine 
designed  for  a  longer  convolution  would  be  Bore  efficient.  This 
sane  code  can  be  used  on  multiple  VSP  chips  slnniltaneously  to  give 
a  considerable  spcMd  increase.  There  may  be  no  benefit  to  executing 
on  more  than  one  VSP  chip  per  bus  because  the  FIR  instruction  nay  not 
give  up  the  bus  between  output  points. 

To  get  a  full  convolution  of  the  input  requires  padding  both  ends  of 
the  longer  input  s^uence  with  a  nuiober  of  zeroes  equal  to  the  length 
of  the  shorter  sequence  minus  one.  This  is  required  in  order  to 
explicitly  provide  the  zeroes  that  are  assumed  to  be  multiplied  by 
elements  of  the  shorter  sequence  that  extend  beyond  the  ends  of  the 
longer  one  during  the  convolution  process.  The  length  of  the  output 
sequence  should  Be  equal  to  the  sum  of  the  lengths  of  the  (unpadded) 
input  sequences  nilnus  one.  If  a  circular  convolution  is  desired 
Instead  of  a  linear  one,  the  zero  padding  should  be  replaced  with 
points  from  the  other  end  of  the  input  sequence. 

The  shorter  input  length  is  passed  in  Coef^Lencth.  The  output  length 
(equal  to  input  length  before  padding  plus  coefficient  length  minus  one) 
is  passed  as  Out  Length.  Coefficients  points  to  the  shorter  sequence 
(typically  FIR  fXlter  coefficients).  ln_Data  points  to  the  start 
of  the  longer  sequence  (possibly  a  zero  pad).  The  output  is  placed 
at  Out  Data.  Typical  call  for  a  four  tap  filter: 

CALL  RCOIIV(4,  1024,  fiCoef,  £In,  fiOut) 

The  convolution  can  be  performed  in  place  with  careful  choices  of 
parameter  values.  If  the  convolution  requires  multiple  calls  on  a 
single  VSP  chip,  the  output  must  begin  at  the  first  location  of  the 
long  input.  This  avoids  overwriting  inputs  that  will  be  needed  for 
the  next  call.  However,  if  multiple  chips  are  being  use^  t.ne  output 
must  overwrite  the  last  input  used  in  its  computation.  This  works 
because  the  VSP  chip  has  already  read  the  input  into  internal  RAM 
for  further  use.  It  is  necessary  because  that  input  is  the  first 
one  which  will  not  be  needed  by  the  chip  %for)cing  on  the  previous 
portion  of  the  convolution.  Some  further  care  Is  needed  in  the 
initial  startup  of  in-place  multiple  chip  convolution  to  ensure  that 
a  chip  does  not  %rrite  over  any  input  values  before  the  subsequent 
chip  reads  them  in.  A  multiple  call,  multiple  chip  convolution 
cannot  be  done  in  place  because  the  constraints  are  contradictory. 
However,  such  a  large  data  set  would  not  fit  into  shared  rnttoory. 

Splitting  up  a  convolution  between  HUM_CHIPS  chips  would  require 
something  like  the  following  invocation  for  chip  ranging  from  zero 
to  (MUM_CHIPS  -  1): 


CALL  RC01IV(COEF  LSH,  OUT  SIZE  (chip),  fiCoef, 

6(ln  ♦  DATA_OPFSET(chip)T,  4(0ut  V  DATA_OPFSET{chlp) ) ) ; 


witb  the  definitions 


fdefine  OUT  LEH  (IN  LEN  4-  COEF  LEH  -  1) 

ideflne  DATA  OFFSETTCHIP)  (((CHIP)  *  OUT  LER)  /  HUM  CHIPS) 
fdefine  OUT  5IZE(CHIP)  (DATA  OFFSET (CBIPTl)  -  DATA  nFFSET(CHIP) 


Note  that  since  all  this  routine  does  is  to  load  various  values  into 
internal  registers  and  RAM  and  then  execute  a  single  instruction,  it 
might  be  faster  for  the  68020  to  load  the  values  directly  and  execute 
the  FIR  instruction  in  slave  mode.  The  same  applies  to  the  complex 
convolution  and  the  correlations. 

•/ 


zsp325( ) 


SUBROUTINE  RC0NV( zr325int  Coef  Length, 
zr325int  Out  Len^h, 
zr325ref  CoeTflcIents, 
zr325ref  In  Data, 
zr325ref  Ou^^Data) 

^*aet  up  mode  properly,  one  RAM  bank,  24  bit  Integers  */ 

SET  [  -RMS,  -1WR  ,  -IfMT  ]; 

/*Bet  $SAR  to  put  output  in  correct  place  */ 

LDR  Out.Data  ->  $8AR; 

/*to  get  real  coefficients  in  zig-zag  order,  need  to  load  half 
as  many  (rounded  up)  "complex"  coefficients 

SHLSETR: (SHIFT«171  Coef  Length  •>  $PR; 

ADDR  $PR,  i0x020060;  ~ 


/"load  coefficients  in  reverse  zig-zag  real  order  "/ 
LDR  Coefficients  ■>  $A; 

ADDR  |a,  Coef  Length; 

8UBR  $A,  #2;  ” 

LD  (I,R):($IIMPT)  $A!(-1,1)  ->  $C0; 


/"now  set  up  actual  lengths  for  FIR  instruction  "/ 
SHLSBTR:  [SHIFT**16]  Coef  Length  ■>  $PR; 

ADDR  $PR,  Out  Length;  ~ 


/"convolve  with  input  sequence  "/ 
FIR_R:($NMPT,  $RSPfiAT)  $2o,  •In_Data; 


> 


Pag.: 


Date:  6/30/92 
Size:  4158 


File: 

Last  Modified:  Tue  Jun  30  15: 


1 

2 

3 

4 

5 

6 
7 
6 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 


•  /*VPH  code  for  convolution  of  a  complex  sequence  of  up  to  32  points  with 

>  another  longer  complex  sequence/  producing  up  to  1024  outputs*  This 

-  size  can  be  done  with  a  single  fIk  Instruction*  This  code  can  be 

«  called  repeatedly  on  a  single  processor  to  handle  convolutions  where 
~  more  than  1024  output  points  are  required  as  long  as  the  shorter 
~  sequence  is  still  no  more  than  32  points*  However,  a  different  routine 

-  designed  for  a  longer  convolution  would  be  more  efficient.  This 

.  B2UDe  code  can  be  used  on  multiple  VSP  chips  simultaneously  to  give 
a  considerable  speed  increase*  There  may  be  no  benefit  to  executing 

-  on  more  than  one  VSP  chip  per  bus  since  the  FIR  instruction  may  not 

•  give  up  the  bus  between  output  points. 

-  To  get  a  full  convolution  of  the  input  requires  padding  both  ends  of 

-  the  longer  input  sequence  with  a  number  of  complex  zeroes  equal  to  the 

•  length  of  the  shorter  sequence  minus  one.  This  is  required  in  order 

-  to  explicitly  provide  the  zeroes  that  are  assumed  to  be  multiplied  by 

•  elements  of  the  shorter  sequence  that  extend  beyond  the  ends  of  the 

>  longer  one  during  the  convolution  process.  The  length  of  the  output 

-  sequence  should  be  equal  to  the  siua  of  the  lengths  of  the  (unpadded) 

-  input  sequences  minus  one.  If  a  circular  convolution  is  desired 
'  instead  of  a  linear  one,  the  zero  padding  should  be  replaced  with 

-  points  from  the  other  end  of  the  input  sequence* 

•  The  shorter  input  length  is  passed  in  Coef  Length.  The  output  length 
(equal  to  input  length  before  padding  plus^coecficient  length  minus  one) 

- -*  —  ^  * - Coefficients  points  to  the  shorter  sequence. 

_  -  _  JgJQ 


-  is  passed  as  Out  Length.  _ , _  . 

•  In  Data  points  to  the  start  of  the  longer  sequence  (possibly  a 

•  pad).  The  output  is  placed  at  Out  Data.  Typical  call: 

.  tKLL  CCOHV(4/  1024/  iCoef,  Sin,  fiOut) 

-  The  convolution  can  be  performed  In  place  with  careful  choices  of 

•  parameter  values,  if  the  convolution  requires  multiple  calls  on  a 

-  single  VSP  chip,  the  output  must  begin  at  the  first  location  of  the 

•  long  input.  This  avoids  overwriting  inputs  that  will  be  needed  for 

-  the  next  call.  However/  if  multiple  chips  are  being  us^.  the  output 

-  must  overwrite  the  last  input  used  in  its  computation.  This  worlcs 

-  because  the  VSP  chip  has  already  read  the  input  into  internal  RAM 

•  for  further  use.  Ih  is  necessary  because  that  input  is  the  first 
>  one  which  will  not  be  needed  by  the  chip  working  on  the  previous 

•  portion  of  the  convolution.  Some  further  care  is  needed  in  the 
initial  startup  of  in-place  multiple  chip  convolution  to  ensure  that 

•  a  chip  does  not  write  over  any  input  values  before  the  subsequent 

-  chip  reads  them  in.  A  multiple  call/  multiple  chip  convolution 

•  cannot  be  done  in  place  because  the  constraints  are  contradictory. 

-  However/  s'^uch  a  large  data  set  would  not  fit  into  shared  memory. 

-  Splitting  up  a  convolution  between  NUM_CBIPS  chips  would  require 

-  something  like  the  following  invocation  for  chip  ranging  from  zero 

-  to  (mJM_.eHlPS  -  1): 

.  CALL  CCOKV(COEF  LEtt,  OUT  SIZE(chip)/  fiCoef/ 

-  4(In  ♦  2*DATA_OPFsra(chip))/  4(Out  ♦  2*DATA_OFFSET(chlp) ) ); 

-  with  the  definitions 

-  Idefine  OUT  LEK  (IN  LEH  ♦  COEF  LEtt 


3) 


-  Idifino  DATX  OFFilitTCHipi  (((CHIP)  *  oAt  LE»)  /  HUM  CHIPS 

-  Idefine  OUT_SlZE{CHrP)  (0ATA_OFFSST(CBZPri I  -  DATAJJPFSBTi 

-  DATA  OFFSET  la  doubled  when  used  with  pointer  paraaeters  because 

-  MCb~conplex  element  requires  two  machine  words. 

-  zsp325() 

- 

-  StraiKWTIHE  CCOIIV(  zr3251nt  Coef_l,ongth, 

-  zr325int  Out  Length. 

-  zr325ref  CoeTflcientS/ 

-  zr32Sref  In  Data, 
zr325ref  Out  Data) 


^*set  up  mode  properly,  one 
SET  [  -RMS,  -ITOR  ,  •IfMT  )  ,• 


one  RAM  bank*  24  bit  integers  */ 


>  /*8et  $SAR  to  put  output  in  correct  place  */ 

.  LDR  Out^Data  «>  $SAR; 

-  /*now  set  up  lengths  for  LD  and  FIR  Instructions 

-  SHLSETR; [SHIFT- iS]  Coef  Length  *>  $PR; 

-  ADDR  $PR/  Out_Length;  *" 

-  /*load  coefficients  in  reverse  order  */ 

>  LDR  Coefficients  ->  $A; 

>  ADDR  SA,  Coef  Length; 

-  SUBR  $A.  #2;  " 

-  LD_C:($HMro)  $A:(-1,1)  «>  SCO; 

-  /^convolve  with  input  sequence  •/ 

-  FIR_C:<$IOIPT/  SRSPEAT)  SdO/  •In^Oata; 

:i/ 

-  ) 
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-  /‘Routine  to  perform  rectangular  to  polar  converelon  on  a  complex  vector. 

-  Uaea  a  Cordic-liXe  algorltliD  for  magnitude  and  an  arctangent  lookup 

-  table  for  angle  in  radians.  Maximum  error  in  magnitude  la  2%  for 
•  three  iterations^  which  can  easily  be  reduced  to  a  value  as  low  as 

>  0.0002%  by  increasing  the  number  of  iterations  to  eight.  Maximum  error 

-  in  angle  is  2.33%  for  5  bits  from  each  mantissa,  which  requires  a  table 

-  of  IK  entries  for  first  quadrant  angles  only.  The  table  size  must  be 

-  quadrupled  for  each  doubling  in  precision,  sc  this  approach  is  not 

-  practical  for  high  precision. 


This  program  computes  only  first  quadrant  angles.  Other  angles  are 
Bx>ved  into  the  first  quadrant  by  taking  the  absolute  value  of  both 
components.  This  means  that  the  angle  will  be  correct  for  the  first 
quadrant,  equal  to  pi  minus  the  true  angle  in  the  second  quadrant. 


-  quadrant,  equal  to  pi  minus  the  true  angle  in  the  second  quadrant, 

-  equal  to  the  true  angle  minus  pi  in  the  third  quadrant  ana  equal  to 
>  minus  the  true  angle  In  the  fourth  quadrant.  These  angles  are  the 


minus  the  true  angle  in  the  fourth  quadrant.  These  angles  are  tm 
absolute  values  of  the  angles  bet%feen  the  complex  numbers  and  the 
nearest  real  axis.  If  full  angles  are  needed,  the  table  can  just 
quadrupled  to  handle  sign  bits  in  the  index. 


•  The  vector  length  is  passed  in  the  parameter  Length.  The  parameter 

-  In_Data  points  to  the  vector  to  be  converted.  The  output  is  placed 

-  at  Out_Data.  The  conversion  can  be  performed  in  place  if  desired. 


•  /‘need  arctangent  function  for  table  ‘/ 

-  linclude  <math.fa> 

.  /‘number  of  bits  from  each  mantissa  to  be  used  in  arctangent  table  lookup  */ 
•>  idefine  TAB_BITS  5 

.  /‘number  of  Cordlc  iterations  for  magnitude  calculations  */ 

-  #define  MAG_ITER  3 

>  /‘function  to  return  arctangent  table  value  for  index  number  ‘/ 

-  /‘only  handles  first  quadrant  angles,  but  c^uld  be  modified  for  all  four  */ 

-  float  tabentry(lnt  i) 

-  Int  fblts[2]; 

-  int  part; 

-  int  index; 

-  /‘determine  numbers  that  would  have  produced  the  given  index  «/ 

-  for  (part  -  0;  part  <•  1;  part++) 

-  ^‘extract  interleaved  mantissa  bits  from  index  ‘/ 


fbitsfpart'  «  0; 

for  (index  •  0;  index  <  TAB^BITS;  index+4) 
^bitafpart]  |»  (l  «  index)  C  (1  »  index  +  part); 


/‘return  middle  angle  jf  the  possible  range  ‘/ 
return  (atan27( double)  fbits(D1  *  1.  (double)  fbits(l]) 
ataa2( (double)  fbitsfO],  (double)  fbits(l)  ‘  1))  /  2*0; 


/‘actual  assembly  generation  function  ‘/ 
zap325() 

Int  index; 

/‘Generate  arctangent  table.  Because  of  normalization,  only  first 
entry  and  last  three  quarters  of  table  are  actually  used. 

/# 

AtanTab: : 

#/ 

for  (index  ■  O;  index  <  (1  «  TAB_BITS*2); 

•  DATA  {  ( IEEE_Float(  tabentryt Index) ) )  }; 


-  SUBROUTINE  RECT2POL( zr32Slnt  Length,  zr325ref  In_Data,  zr325ref  Out_Data) 

-  /*Bet  up  two  RAH  sections,  swapping  on  each  l(>op  Iteration  */ 

-  SET  [  -TnmS,  -XOR  ]; 

-  /'load  data  pointers,  paraaeter  order  gets  In  Data  Into  $A  */ 

-  LDR  Out_Data  ■>  I SB,  $X]; 

-  /'Initialize  loop  count  to  nuBber  of  32s,  skip  loop  if  none  */ 

-  8HRSETR:[SBIFT.sT  Length  ->  SLC; 

-  JMPC  [Zr],  Do_Rest) 

-  /'first  part  of  loop  to  fill  software  pipeline  */ 

-  /'load  to  bank  1,  taka  absolute  value  to  put  In  first  quadrant  */ 

-  M  TT:(32)  $A  ->  SCI;  ^ 

-  /'align  aantlsaas  and  Interleave  to  create  atan  Index  In  SID  */ 

-  JaiOHi(32)  SRI,  SIl  ->  SIO; 

-  /'do  cordlc  Iterations  to  gat  aagnltuda  In  SRI,  takas  a  while  •/ 

-  MA0!(32,MA0_ITEB)  SCI; 

-  /'look  up  arctangent  in  table,  overlaps  with  HAG  */ 

-  LUT:(32):ISH1PT-T23  -  2'TAE  BITS)]  AtanTab,  SIO  ->  SIO; 

-  /'store  angle,  overlaps  wltK  HAG  */ 

-  aT_l:(32)  SlO  ->  $B..1:(2,1); 

-  /'dacrasMnt  SLC.  end  loop  If  done  '/ 

-  JMPC:[IE,DL)  [LZ] ,  Do_atore; 

-  /'software  pipelined  loop,  allows  next  load  to  overlap  HAG  */ 
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Loop: : 

T1:(32)  $A+-64  ->  SCI; 
/*8tore  magnitude  from  pr 


/*8tore  magnitude  from  prevloua  vector  */ 
ST  R:i32)  5R0  «>  $B«1:(2,1); 

- ^^N:(32)  SRI,  $11  ->  $10; 


ALTON:(32)  SRI,  SIl  ->  $10; 

HAQ:(32,HAG  ITER)  SCI; 

LUT:i325:(SHIFT=i23  -  2*TAB  BITS)]  AtanTab,  SIO  =>  SIO; 
ST  I:(32)  SIO  ->  SB+-64:(2,T); 

/*3ecrement  counter  and  branch  to  top  if  not  done  */ 
JMPC: [IE:1,DL:1]  [tLZ],  Loop; 


>  Do  Store:: 


/*re8t  of  loop  to  empty  software  pipeline  */ 
/*atore  magnitude  from  laat  vector  */ 
ST_R:(32)  SRO  »>  $B-1:(2,1); 

Do  Rest:: 

/*7iandle  remainder  left  after  bloclca  of  32  */ 


-  /*8hlft  remainder  into  SNMPT,  uae  [TC]  to  zero  hlgft  bit  •/ 

-  SHLSETR:TshiPT»18,TC]  Length  «>  SPR; 

-  JMPC  [ZR],  End; 

-  /*need  MAG  ITER  in  SREFEAT  to  uee  $PR  with  HAG  */ 

-  ADDR  SPR,  iMAG_ITER; 

-  /‘finish  up  remainder  */ 

-  I'D  TisdNMM)  SA+-64  ->  SCI; 

-  ALIGN: (SNMPrj  SRI,  $11  ->  $10; 

>  MAG:  (SNMPT, S^PEAT)  SCI; 

-  LUT:  (SNMPT)  :fSHIFT*(23  -  2*TAB  BIT<5)]  AtanTab,  $10  *>  $10; 

-  ST  I:ISNMPT)  $10  *>  $B*-64:(2,T); 

-  STTl:($NMPT)  SRI  ->  $B-1:(2,1); 
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-  /*PrograB  to  c<»pute  8x6  2D  coaplex  FFT  uaing  one  V6P  chip. 

-  The  paraawter  In  Data  points  to  the  input  vector.  The  output  vector 

-  is  placed  at  Out'~Data.  The  operation  can  be  perforaed  in  place  if 

>  desired.  Both  input  and  output  vectors  are  in  normal  order. 

.  To  get  an  inverse  FFT,  just  change  the  subroutine  name  and  change  the 

-  FFT  instructions  to  IFFT  instructions. 

.  To  use  real  data,  change  LD_C  to  LD_(R,0). 

Might  be  able  to  squeeze  a  little  more  speed  out  by  starting  with 

>  two  RAM  sections,  load  first.  FFT  first  rows,  load  second,  fFT  second 

-  rows,  switch  to  one  RAM  section,  FFT  columns,  store. 


-  */ 


-  SUBROUTINE  FFT2D8(zr325ref  In_Data.  zr325ref  0ut_0ata) 

-  ^*8et  up  one  RAM  section  */ 

-  SET  [  »fofS,  «!XOR  ]; 

-  /*load  all  64  entries,  with  rows  bit  reversed  */ 

-  LD_C:(64)  *In_Data: (8,8~)  ->  $0; 

-  /*FFT  the  rows,  result  in  normal  order  */ 

-  FFT_C;(8,8):[FPS:1,LPS:4]  $0“,  $ROM-0:512; 

-  /*FFT  the  columns,  result  in  bit  reversed  order  */ 

-  FFT_C:(64):[FPS:32,LPS:81  $0; 


-  /*8tore  result 


-  ST_C:(64)  $0  ■ 

-  1/ 

-  ) 


> 


bit  reversing  columns  into  normal  order  */ 
*Out_Data : ( 8  , a ) ; 
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'  /*Routine  to  conpute  magnitude  squared  for  a  complex  vector.  If  the  vector 

-  is  the  FFT  of  a  signal,  this  is  the  po%fer  spectrum  of  the  signal.  This 

*  routine  is  faster  than  the  rectangular  to  polar  conversion  and  should  be 

'  used  If  the  magnitude  squared  Is  as  useful  as  the  magnitude.  For  example, 

-  the  point  of  magnitude  is  also  the  point  of  maximum  poitfer. 

'  This  routine  can  be  performed  in  place,  producing  an  output  vector  half 

*  the  length  of  the  input.  This  %rould  leave  gaps  If  multiple  VSP  chips 

•  were  being  used.  If  the  calculation  is  not  performed  in  place,  or  gaps 

-  are  acceptable,  there  is  no  problem  using  multiple  chips  to  calculate 

*  parts  of  the  output  vectors. 

~  Note:  this  routine  is  I/O  bound  even  on  a  single  VSP.  With  two  sharing 

-  a  bus,  it  will  be  even  worse.  If  it  is  being  used  immediately  after  an 

*  FFT  operation,  it  would  be  more  efficient  to  perform  the  magnitude 

•  squared  operation  as  the  last  step  of  an  PFT  routine  before  storing  the 
'  result.  This  would  save  a  store  and  reload. 

•  The  input  parameter  Length  contains  the  number  of  elements  in  the 

•  input  vector.  The  parameter  In  Data  points  to  the  start  of  the 
~  input  vector.  The  output  will  5e  placed  at  Out^Data. 

-  •/ 


•  zsp325() 

- 

-  SUBROUTINE  POWER( zr3251nt  Length,  ir325rof  ln_Dat«,  zc32Sref  Out_Data) 


-  /*use  both  RAM  banka  to  improve  throughput  */ 

-  /*set  up  two  RAM  sections,  swapped  by  $LC  */ 

.  set  [  »Tnms,  -XOR  1; 

•  /*set  up  pointers  to  data  areas,  compensate  $B  for  pre-increment  */ 

•  /*Note:  Load  depends  on  parameter  order  to  get  In  Data  into  $A  */ 

-  LDR  Out  Data  »>  ($B,  SA]  ; 

•  SUBR  $B7  «32; 

•>  /^initialize  loop  count  to  number  of  328,  skip  loop  if  none  */ 

-  SHRSBTR: [SBIFT-51  Length  $LC; 

-  JMFC  [ZR],  Do  Rest; 


-  /*start  up  with  first  RAM  bank  */ 

-  LD  C:(32)  SA  «>  SCO: 

•  MGSgL.R:(j2T  SCO  •>  $R0; 


^  /*if  no  more  to  do,  skip  rest  of  loop  */ 
«  JMPC:[IE,DL1  (LZ],  Do_atore; 


-  /‘looi 

-  LD  Cs' 

-  mgs6  Ajij 

'  ST  irr|32) 


with  software  pipeline, 
32)  SA+-64  ■>  SCO; 

(32)  SCO  ■>  $R0; 
iRl  •>  SB+-32; 


L00P:tlB,DL]  (ILZ],  #3; 


XOR  with  SLC  alternates  RAM  •/ 


«  Do  Store:: 

•  /*~Bave  last  RAM  bank  •/ 
«  ST  R:(32)  SRI  «>  $B<»>«32; 


-  Do  Rest:: 

-  /*1iandle  remainder  left  after  blocks  of  32  */ 


-  /*shift  remainder  into  SNMPT,  use  [TC]  to  zero  high  bit  (32s)  */ 
SHLSSTR:rsaiFT«18,TCJ  Length  ->  $FR; 

^  JMPC  [ZR],  End; 


LD  C: 


'  MGSQli:(SNM^j  S( 
-  ST_RT(Sjtfira)  $R0 


sh  up 
SNMra 


WT)  $A>«64 

- 


^SCO; 
>  $R0; 
$B*«32; 


End: : 


)/ 
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File:  B:FINISB.ASM 

Last  Modified:  Tue  Jun  30  14:39:12  1992 


'  notify  68020  of  task  coopletion.  This  code  is  never  actually 

»  called  froiD  anywhere.  Instead,  its  address  is  used  as  the  return 

-  address  in  the  call  frame  that  the  68020  sets  up  when  invoking  another 

-  routine,  when  the  routine  completes  and  returns.  It  will  execute  this 
'  code.  This  method  allows  all  routines  to  be  called  without  havlna 

-  them  terminate  the  task  until  final  completion. 

-  /^status  bit  value  to  indicate  finished  */ 

-  Idefine  FINISHED  2 

-  zsp325(} 

-  U 

-  SUBROUTINE  FINISH  (  ) 

-  ^*got  value  for  status  bits  */ 

-  LDR  fFINISBEC  ->  $X; 

•  /*make  sure  all  operations  are  complete  */ 

-  SYNC:(AS,CU,BU,MU1; 

•  /*write  to  global  status  latch  */ 

•  STR  $X  •>  0x40000; 

-  /‘halt  •/ 

-  HLT; 

-h 

- } 


Data:  6/30/92 

Slza:  4868 


Plla:  BtPlPE  P2R.A8M 
Last  Modified:  Wed  May  20  15:13:74  1992 


•  /^Routine  to  perfora  polar  to  rectangular  conversion  on  a  conplex  vector. 
»  Uses  separate  sine  ana  cosine  tables.  Could  use  one  table  for  both, 

•  but  that  would  require  extra  tine.  Only  operates  on  angles  In  the  first 

•  quadrant  since  those  are  the  only  ones  product  by  the  rectangular  to 

-  polar  conversion.  The  table  size  will  detemlne  the  accuracy  of  the 

-  conversion.  The  error  will  be  less  than  100%  *  pi  /  (4  *  table  size). 

-  The  vector  length  Is  passed  In  the  paraziater  Length.  The  paraneter 

•  In  Data  points  to  the  start  of  the  vector  to  be  converted.  The  result 

-  Is'placed  at  Out  Data.  This  algorlthn  can  be  performed  In  place  If 

•  desired. 

-  This  routine  uses  software  pipelining  to  eaxlmlze  throughput.  This 

-  should  cause  the  bus  to  be  busy  aost  of  the  tins.  If  two  chips  are 

-  perforalng  this  at  the  saae  tlse,  there  will  not  be  enough  bandwidth. 

•  Benchaarklng  will  need  to  be  used  to  determine  whether  this  Is  faster 

-  I^an  a  version  which  does  not  attempt  pipelining  but  uses  larger  blocks. 

•  /*need  trig  functions  for  tables  */ 

-  ilnclude  <Batb.b> 

-  /*slze  of  sine  and  cosine  tables  */ 

•  tdsflne  TAB.SIZE  128 

•  /*size  of  Increment  between  table  entries  */ 

>  fdeflne  IHCREMENT  (asln(1.0)/(TAB_SIZE-l)) 

/^assembly  generation  function  */ 

-  zsp325() 

-  Int  Index; 

-  /*0enerate  trig  function  tables.  */ 

-  /# 

•  SinTab:: 

-  #/ 

-  for  (Index  •  0;  Index  <  TAB_SIZE;  Index^^] 

-  U 

-  .OAXA  {  (IEEE_Flo.t(Bln(lndex*INCREME(IT) ) )  >; 

-  CosTab: ; 

-  #/ 

-  for  (Index  0;  Index  <  TAB_SIZE;  lndeX'»4) 

- 

-  {  ( IEEE_Floe£(co8(  Index'IHCREMEMT) ) )  >; 


-  SUBROUTIHG  POL2RECT(zr32Slot  Length,  *r325ref  In_Data,  zr325ref  Out_Data) 

-  /‘uee  both  RAH  banka  to  optlalze  throughput  */ 

-  /*Rote:  cboaan  Interleaving  pattern  aaaunea  LUT  inatructlon 

-  aakea  no  uae  of  BU  ainca  it  la  a  data  Boveaent  Inatructlon. 

-  Alao  aaauaaa  that  arlthaatlc  oparatlona  that  uae  external 

-  operanda  can't  be  ovarlapp^  with  aove  Inatructlona,  though 

-  thia  lan't  clear. 

-  Banchaark  alght  be  needed  to  check  the  Interleaving  pattern. 


-  /‘aet  up  two  RAN  aectlona,  awappad  by  $LC,  round  to  neareat  */ 

-  SET  [  -Tnms,  -XOR,  -ROURD  1; 

-  /‘load  polntera  to  data,  ahlftlng  $A  to  angle,  coapenaate  pre-Inc  */ 

-  ISBTR  In  Data  ->  $A; 

-  LOR  Out  Data  ■>  SB; 

-  SUBR  [$B,  $A],  164; 

-  /‘Initialize  loop  count  to  nuaber  of  32a,  aklp  loop  if  none  ‘/ 

-  SBRSETR; rSBIFT-S]  Length  ->  SLC; 

-  JMPC  (ZR),  Do_Ra8t; 

-  /‘atart  up  convaralon  with  flrat  RAH  bank  ‘/ 

-  /‘load  angle  Into  laaglnarv  part  ‘/ 


-  /‘Bultlpiy  by  factor  to  gat  table  offaat  ‘/ 

-  HULT_(R,r1:(32)  $C0,  t(lgiB_Float(  1  .O/INCREMERT]  )  ->  SIO; 

-  /‘convert  to  Integer  to  get  Integer  part  right  juatifled  ‘/ 

-  FP1IIT_R:(32)  SIO  ->  $10; 


/‘If  no  Bore  to  do,  aklp  raat  of  loop  ‘/ 
JMPC: [IE, DL]  [LZ] ,  Do_8tora; 

/‘loop  with  aoftwara  pipelining  ‘/ 

and  atart  next  vector  ‘/ 


-  T^oad  and  atart  next  vector  ‘/ 

-  LO  I:(32)  $A4-64:(2,I)  •>  $10; 

-  HUCT  (R,R):(32)  $C0,  l(IEEi  Fioat(  1.0/IRCREKERT) )  •>  $10; 

-  /‘do'~bua  operation  for  pravToua  during  execution  of  current  ‘/ 

-  LUT  R:(32)  CoaTab,  $11  ->  $R1; 

-  /‘do  next  operation  on  currant  vector  ‘/ 

-  FPIRT  Ri(32T  $10  ->  $10; 

-  /‘flnlah  and  atora  pravloua  vector  */ 

-  LUT  Rt(32)  SinTab,  $11  »  $11; 

-  /‘aaauaa  external  operand  fetch  aonopolizaa  bua  unit  ‘/ 

-  /‘Bacreaant  count  (awltcfaaa  banka)  and  loop  iBediately  If  not  dona  ‘/ 

-  JMPC:[DL,IB]  [ILZ],  Loop; 

-  Do  Store: : 

-  /‘Tlnlah  up  laat  RAM  bank  ‘/ 

-  /‘look  up  coalne  of  angle  In  table  ‘/ 
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LUT_R:(32)  CosTab,  $11  •>  $R1/ 

/*Iook  up  sine  of  angle  In  table  */ 

LUT_R:(35)  SlnTab,  $11  ->  $11; 

/•nultiply  cosine  and  sine  by  magnitude  to  get  real  and  imaginary 
MULT_(R,r1:(32)  $C1,  $A-1:(2;1)  5>  sci; 

/*store  resulting  COTplex  number  In  rectangular  coordinates  */ 
ST_C;(32)  $C1  ->  $B+-64; 

Do  Rest:: 

/*1iandle  any  remainder  left  after  blocks  of  32  */ 


•  /*8hlft  remainder  Into  $NMPT,  use  FTC]  to  zero  high  bit  (328)  */ 

-  SHLSBTR:TshIFT*18,TC)  Length  «>  $PR; 

-  JMPC  [ZR],  End; 

•  /* finish  remainder  */ 

•  /*load  angle  into  Imaginary  part  */ 

-  ip_l:(SHMPT)  $A^»64;(2,1)  *>  $10; 

.  /*multlply  by  factor  to  get  table  offset  */ 

>  MULT  (R,R):($HMPT)  $C0,  |(IEEE  Float(  1. 0/lNCRBKERT) )  ->  $10; 

•  /*coHvert  to  integer  to  get  integer  part  right  justified  */ 

-  FPIHT  R:($IIMPT)  $10  «>  SIO: 

-  /*looE  up  cosine  of  angle  In  table  */ 

-  LUT_R:($RMPT)  CosTab,  $10  •>  $R0; 

/*look  up  sine  of  angle  In  table  */ 

-  LUT_R: ($RMPT)  SlnTab;  $10  «>  $10; 

•  /*multiply  cosine  and  sine  by  magnitude  to  get  real  and  Imaginary 

-  MULT  (R,r1:(S»MPT)  $C0,  $A-1:(2,I)  •>  $C0; 

-  /*8tore  resulting  complex  number  in  rectangular  coordinates  */ 

-  ST_C:  ($IIMPT)  $C0  ->  $i4>-64; 


145  -  } 


D«t«:  6/30/92 

Six«:  3807 


Last  Modified: 
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/^Routine  to  perforn  polar  to  rectangular  conversion  on  a  coaplex  vector. 
Uses  separate  sine  and  cosine  tables.  Could  use  one  table  for  both,  but 
that  would  require  extra  tloe.  Only  operates  on  angles  In  the  first 
quadrant  since  those  are  the  only  ones  produced  by  the  rectangular  to 
polar  conversion.  Other  angles  will  produce  unexpected  results.  The 
table  size  will  deteraine  toe  accuracy  of  the  conversion.  The  error 
will  be  less  than  100%  *  pi  /  (4  *  table  size). 

Length  of  the  vector  to  be  converted  is  passed  in  Length.  In_Data 
points  to  the  start  of  the  input  vector.  Output  is  placed  at  location 
Out_Data.  Conversion  can  be  perforaed  in  place  if  Msired. 

This  version  assuaes  perforaance  is  bounded  by  local  bus  bandwidth  and 
therefore  doesn't  atteapt  software  pipelining  alternating  RAM  banxs. 
Instead  it  uses  the  entire  RAM  at  once  to  ainiaize  bus  traffic  for 
instruction  fetching.  This  also  a^es  the  code  more  readable.  Testing 
will  be  needed  to  see  which  aetbod  is  faster.  Using  half  of  RAM  ud 
iMding  aagnitude  in  other  half  before  MULT  eight  save  aore  bandwidth. 

/*need  trig  functions  for  tables  */ 
finclude  <Bath.b> 

/*size  of  sine  and  cosine  tables  */ 
fdefine  TAB.SIZE  128 

/*8ize  of  increaent  between  table  entries  */ 
fdefine  INCREMENT  (aain(l.O) /(TAfi^SIZE-l) ) 

/•asseably  generation  function  */ 

Z8p325() 

int  index; 

/*Generate  trig  function  tables.  */ 

/# 

SinTab: : 

#/ 

for  (index  -  0;  index  <  TAB_SIZE;  index+4) 


•  DMA  {  (IBBE_Plaat(aln(lndex*IHC!lEME]rT) ) )  ); 

I', 

CosTab: : 

#/ 

for  (index  «  0;  index  <  TAfi^SIZE;  index4«) 

U 

.DAXA  (  (IEEE_Float(co.(index*IMCKEMEIIT)))  }; 


;  > 

-  /» 

-  SUBBOUTIRE  POL2KBCX( zr3251nt  Length,  zr32Sref  In.Date,  zr325ref  Out_I>ate) 

-  /*aet  up  one  RAM  aactlon,  net  rounding  to  nearaat  */ 

-  SET  (  •RMS,  •IXOR,  -ROUND  ]; 

-  /*load  polntera  to  data,  conpenaate  for  pre- increaent  */ 

-  /*increBent  $A  at  load  ao  It  polnta  to  angle  part  */ 

-  ISETR  In  Data  ->  $A; 

-  LDR  Out  Data  ->  $8; 

-  SUBR  ($X,  $B],  1128; 

SHRSETR: (SBIFt-i?  Length  •>  $LC; 

-  JMPC  (ZRI,  Do_Reat; 

-  Loop 


/‘load  angle  Into  laaglnary  part  */ 

LD_I:(64)  SA‘-128:(2,I)  ->  $1; 

/‘■ultlply  by  factor  to  gat  table  offaet  */ 

MULT  (R,R']:(64)  $C,  f(IBSE  Float(  1.0/INCREMENT)  1 
/‘convert  to  integer  to  gaC  integer  part  right  nuatlflad  ‘/ 
FPINT  R:(64)  SI  -5  51;  — .  • 

/‘looK  up  coaina  of  angle  in  table  ‘/ 


.5?.-  , 


-  -  - angle 

LUT  R:(6i)  CoeTab,  $I  -»  $R; 

/‘ISok  up  aina  of  angle  in  table  ‘/ 
L0T_R;(6S)  SinTab,  $I  ->  SI 


-  /‘■ultlply  coaina  and  aina  by  aagnitude  to  got  real  and  laaglnaiy  ‘/ 

-  M0LT_(R,r1i(64]  sc,  5A-l;(2,l2  -S  SC; 

-  /‘atora  raaulting  coaplex  nuabar  in  rectangular  coordlnataa  ‘/ 


rectangular 
dlataly  on  not  zero  ‘/ 


ting  coapli 
ST  C:(64)  SC  ->  $B‘-128; 

/‘Dacraaant  $LC,  loop  law 
JMPC; [DL, IE]  (ILZ],  toop; 

Do  Neat; 1 

/‘Handle  raaalndar  left  after  blocka  of  64  ‘/ 

/‘ahift  raaalndar  into  SHHPT,  aklp  if  none  ‘/ 
8HLSBTR:  ( SHIFT-18]  Length  ->  $PR; 

JMPC  [ZR],  End; 


/‘flnlah  ri 
/‘load  am 


Inary  ‘/ 


- _ binder  ‘/ 

angle  into  laag 

IiCSHMfTJ  $A‘-128:r _ 

/‘■ultlply  by  factor  to  get  table  offaet  ‘/ 

MULT_(R,Rlt($IIMPT)  $C,  tllEKE  Float]  1.0/IRCRBKINT) )  •>  SI; 

/‘covert  to  integer  to  get  integer  part  right  juatlflad  */ 
FPIHT_Rj($NMPT)^  ->  SI;  r- 

/‘looK  up  coaina  of  angle  in  table  ‘/ 

LUT  R:($mn)  CoeTab,  $1  ->  SR; 

/‘ISok  up  elaa  of  angle  in  table  ‘/ 

LUT  RiJSImPT)  SinTab'  $I  ->  SI; 

/‘■ultlply  coelna  and  aliw  by  aagnitude  to  gat  real  and  laaglnary  ‘/ 


iiS  :  coordinates  •/ 

112  -  ST  C:($IIMPT)  SC  ->  SB+-128; 

113  -  ~ 

114  -  End:: 

115  - 

116  -  V 

117  -  »/ 

118  -  } 
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-  /^Routines  to  coapute  16x16  2D  complex  PFT  using  four  VSP  chips. 

•  Operation  requires  two  phases  of  operation,  one  to  calculate  row 

-  FFTSf  the  other  to  calculate  column  FFTs.  Using  multiple  VSP  chips 

•  requires  synchronization  between  phases  so  that  data  can  be  exchanged. 

•  These  routines  do  not  include  the  sychronlzation.  The  routines  for 

•  each  phase  can  be  called  from  another  routine  which  provides  it  between 

•  calls,  or  the  68020  can  invoke  the  first  phase  and  wait  for  it  to 

-  finish  before  invoking  the  second. 

-  Each  VSP  chip  could  calculate  its  four  rows  or  columns  in  one  instruction, 

-  but  using  two  RAM  sections  allows  more  concurrency.  Each  chip  should 

-  be  passTO  data  pointers  to  row  or  column  (CHIP  *  4)  with  CHIP  equalling 

-  0,  1,  2,  or  3,  depending  on  the  chip. 

-  The  parameter  In  Data  points  to  the  input  vector.  The  output  vector 

-  is  placed  at  Out~Data.  The  operation  can  be  performed  In  place  if 

•  desired.  Both  input  and  output  vectors  are  in  normal  order. 

-  To  get  an  inverse  FFT,  just  cheuige  the  subroutine  name  and  change  the 

-  FFT  instructions  to  IfTT  Instructions. 

-  To  use  real  data,  either  set  the  imaginary  parts  to  zero  to  get  a  complex 
>  vector,  or  change  LD  C  to  LD  (R,0)  to  use  a  real  vector.  With  a  real 

-  vector,  this  operation  cannot  be  performed  in  place,  since  the  output 

-  data  would  overwrite  unread  input  data. 

I  ./ 

-  Z8p325() 

-  /*FFT  for  rows,  four  16  point  FFTs  on  sequential  data  ■/ 

-  SUBROUTINE  FFTi6ROW( zr325re£  In_0ata,  zr325ref  Out_Data) 


SET  [ 


t%ra  RAM  sections,  no  need  for  exchange  */ 
NMS,  -  ■ 


set  up 

■  -INMS,  «IX0R  ]; 


->  /*set  up  pointers  for  later  offset,  $A  gets  In  Data  */ 

-  LDR  Out^Data  ->  ($B,  SAl; 

-  /*load  two  rows  into  section  0  */ 

-  LD_C:(32)  $A  »>  $C0; 

-  /*FFT  as  two  16  element  FFTs  */ 

-  FFT_C:(16,2);(FPS:8,LPS:1]  $C0; 

-  /*load  remainder  of  entries  into  section  I  t 
•  U)_C:(32)  $At64  «>  $C1; 

-  /*FFT  as  two  16  element  FFTs  */ 

-  PFT_C:(16,2):(PPS:8,LPS:1)  $Cl; 

-  /•store  first  result,  row  bit-reparsed  */ 

-  ST_C:(32)  SCO  ->  SB:(16,16*); 

-  /•store  second  result,  row  tit-reversed  •/ 

-  aT_C:(32)  SCO  ■>  SB+64:  ( Ir.,  IS"") ; 

;  } 

-  /•FFT  for  columns,  f'.jr  16  point  FFTs  on  interleaved  data  •/ 


/•FFT  for  columns,  f'.jr  16  point  FFTs  on  Interleaved  data 
SUBROUTINE  FFT16COL(zr32Sref  In^Data,  zr325ref  Out^Data) 

^•set 
SET  ( 


RM  sections,  no  need  for  exchange  •/ 


INMS,  -IXOR  ]; 

/•set  up  pointers  for  later  offset,  SA  gets  In  Data  •/ 
LDR  Out^Data  «>  (SB,  SAJ; 

/•load  two  columns  interleaved  into  section  0  •/ 
LD_C:{32)  $A:(16,2)  ->  SCO; 

/•FFT  first  set  as  two  16  eleownt  FFTs  •/ 
FFT_C:(32):(FPS:16,LPS:2]  SCO; 

/•load  remainder  of  entries  into  section  1  •/ 

LD_C;'32)  $A>2:(16,2)  ->  SCI; 

/•FFT  as  two  16  element  FFTs  •/ 

FFT.  ::(32):(FPS:16,LPS:2)  $C1; 

/•store  first  result,  columns  bit-reversed  •/ 

ST_C:{32)  SCO  ->  $B:(16*,2); 

/•store  second  result,  columns  bit-reversed  •/ 
ST_C:(32)  SCO  ->  $B4^2:  (16~,2); 


:l/ 
-  > 
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-  /•Tost  prograa  for  Zoran  interrupts. 

-  */ 

-  /*ab80luto  base  addresses  from  memory  map  */ 

-  #defino  PRAM  0x00000 

-  #deflne  FOUR  PORT  0x20000 

-  #dofino  STA'nJS_LATCH  0x40000 

-  zsp325() 

- 

-  INTERRUPT  SUBROUTINE  SET_HALTO 

-  )*writo  Is  to  status  latch  and  wait  •/ 

-  LDR  #3  •>  SX: 

-  STR  $X  ->  STATUS  LATCH; 

-  Poll:;  ,  " 

-  JMPC  [lEI],  poll; 

-  /*after  resume,  clear  status  bits  */ 

-  LDR  #0  ->  SX; 

-  STR  SX  ■>  STATUS  LATCH; 

-  } 

-  .EXTERN  _SubEntry_SET_HALT; 

SUBROUTINE  MAIN() 


Status  bits  follow  interrupt  bit. 


set  interriy»t^^wtor^(hap|ena 


:  . . . 

-  LDR  S_SubEntry_SET_HM.T 

-  /•vrXt.e  0  to  status  latch  */ 

-  LDR  #0  =■>  $X; 

-  STR  SX  ->  STATUS_IATCH; 

-  /'Infinite  loop  decreaontlng  SLC  from  0  */ 

-  MOVR  SX  =>  SLC; 

-  Loop: : 

-  JMP:[DL]  Loop; 

-  } 

-  #/ 

-  } 


to  be  0,  but  why  not)  '/ 
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«  /*RoutlneB  to  compute  32x32  2D  complex  FFT  using  four  VSP  chips. 

**  Operation  requires  two  phases  of  operation,  one  to  calculate  row 

*  FFTSf  the  other  to  calculate  column  FFTs.  Using  multiple  VBP  chips 

-  requires  synchronization  between  phases  so  that  data  can  be  exchanged. 

«  These  routines  do  not  include  the  sychronizatlon.  The  routines  for 

*-  each  phase  can  be  called  from  another  routine  which  provides  it  between 

-  calls,  or  the  68020  can  invoke  the  first  phase  and  wait  for  it  to 

•  finish  before  invoking  the  second. 

-  EACh  chlp^shouid  be  pass^  column  (CHIP  *  8)  with 


-  CHIP  equalling  0.  1,  : 
«  the  6  rows  or  columns 


2,  or 


It  will  handle 

_  _  _ ^  __  _ ^ _  _ ng  a  parameter  to 

«  give  the  number  of  rows  or  columns  to  do  wou'd  allow  the  same  routine 


,  depending  on  the  chip, 
starting  at  that  point.  Adding 

_ _  _ _ __WB  or  columns  to  do  wou'd  allow  _ _ _ _ 

to  be  used  by  1  or  2  chips  without  needing  tc.  make  multiple  calls. 


•  The  parameter  in  Data  points  to  the  input  vector.  The  output  vector 

•  is  placed  at  OutH^ata.  The  operation  can  be  performed  in  place  if 

•  desired.  Both  input  and  output  vectors  are  in  normal  order. 

«  To  get  an  inverse  FFT,  just  change  the  subroutine  name  and  change  the 

•  FFT  instructions  to  IFFT  instructiona. 

--  To  use  real  data,  either  set  the  imaginary  parts  to  zero  to  get  a  complex 

-  vector,  or  change  LD_C  to  LD  (R,0)  to  use  a  real  vector.  With  a  real 

-  vector,  this  operation  cannot  be  performed  in  place,  since  the  output 

•  data  would  overwrite  unread  input  data. 

I  «/ 

-  zsp325() 

- 

-  /*PFT  for  rows,  eight  32  point  FFTa  on  aeguentlal  data  */ 

-  SUBROUTIHE  FFT32!IOQ( zr32Sref  In_Data,  zr325ra£  Out_Data) 

•  /*aet  up  two  RAM  aectlona.  awapped  by  SLC  */ 

-  SET  (  -IHMS,  -XDR  ] ; 

-  /*aet  polntara  to  Input  and  output,  compenaate  for  increment  */ 

-  /'note:  depending  on  parameter  order  to  get  In  Data  Into  $A  '/ 

-  LDR  Out  Data  ■>  fSB,  |a1;  “ 

-  SUBR  $B7  164; 

-  /'Initialize  loop  count  '/ 

-  LDR  #7  ->  SLC  ; 

-  /'atart  up  FFT  with  firat  RAM  bank  '/ 

-  LD  C:(32)  SA  ->  SCI; 

-  FFT_C:(32)  $01,  5rqm-0:0; 

-  /'loop  7  tlmea,  XOR  with  $LC  altematee  RAM  '/ 

-  LD  C:T32)  $At'64  ->  SCO; 

-  FFT  C:  ■■  — ■  - * 

-  ST  C: 

-  LOOP: 


C:T32)  SAt'64  ->  SCO; 

C:(32)  SCO,  SROM'0:6; 

I: (32)  SCI  •>  SB*-64:(32,1)-; 
P:fDL:lI  flLZ],  #3; 

/'  aave  laat  RAM  bank  '/ 
ST_C:(32)  SCI  •>  SB+«64:i32,l)-; 


-  ) 


P°3nt  FFTa  on  Interleaved  data 
SUBROUTINE  FFT32CCL( zr325ref  In_Data,  zr32Sref  Out^Data) 

/*set  up  two  RAN  sections/  swapped  by  $LC  */ 

SET  (  «Tnms,  -XOR  1;  ^ 


igensata  for  first  increment 


-  /'aet  polntera  to  data,  eomi. - - 

■  dependlng^on  parameter  order  to  get  In  Data  into  S*  '/ 

-  LDR  Out  Data  ->  [SB,  SA): 

-  SUBR  SBT  12; 

-  /'Initialize  loop  count  '/ 

-  LDR  #7  ->  SLC  ; 

-  /'atart  up  PPT  with  firat  RAM  bank  '/ 

-  M  C:(32)  SA:(32.1)  ->  5CI; 

-  FFT_C!(32)  SCI,  iR6M-0:0; 

'  /^3oop  7  times,  XOR  with  SLC  alternates  RAM  */ 

'  m  C=132)  SA+-i!r32,l)  ->  SCO; 

-  FFT  C:(32)  SCO,  iROM-6:0; 

-  ?5nS*i32)  SCI  ->  SB+-2z(52,1)'; 

-  L00P:[lE!l,DL!l]  [ILZ],  13; 

-  /*  save  last  RAM  bank  '/ 

-  ST_C:(32)  SCI  ->  SB+-2;(32,1)'; 

:i/ 

- ) 
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-  /*Prograin  to  perform  complex  correlation  between  two  complex  vectors  with 

-  up  to  32  elements  in  the  shorter  one  and  up  to  1024  elements  in  the  output 

-  using  a  single  zoran  processor.  Due  to  reguirements  of  the  Instruction 

-  used,  the  longer  complex  vector  must  be  padded  at  both  ends  with  (shorter 


-  The  length  of  the  short  vector  is  passed  in  the  parameter  Coef_Length. 

•  The  length  of  the  desired  output  vector  (typically  equal  to  the  sum  of 

-  the  lengths  of  the  input  vectors,  minus  one^  is  passed  in  Out__Length. 

-  The  short  input  vector  is  pointed  to  by  Coeiflcients.  The  parameter 

-  In  Data  points  at  the  first  zero  pad  of  the  longer  input  vector. 

-  The  output  is  placed  at  Out  Data.  The  output  data  could  be  stored  in 

-  the  place  of  tno  first  Inpu^  vector  if  desired.  Typical  call  to  perform 

•  a  full  autocorrelation  in  place  with  a  32  (padded  to  94)  element  vector: 
o  CALL  CCORR(32.  63,  fiin,  fi(in^31),  4io} 

•  The  (ln+31)  skips  the  padding  at  the  front  of  the  vector. 

-  Note:  if  this  routine  will  always  be  used  for  two  equal  length  vectors, 
only  one  length  parameter  la  needed.  The  other  can  be  computed  from  it 

-  with  some  extra  overhead.  On  the  other  hand,  if  this  routine  will  be 

-  used  repeatedly  for  the  same  length,  sending  a  precomputed  $PR  value 

-  instead  of  a  length  would  reduce  overhead  slightly. 


-  zsp325() 

-  U 

-  SUBROUTINE  CC0RR( zr3251nt  Coef^Length, 

-  zr3251nt  Out  Length, 

-  zr325ref  CoeTflcienta, 

*  zr325ref  Zn  Data, 

-  zr325ref  Ou^_Data) 

-  ^*set  up  mode  properly,  one  RAM  section,  24  bit  integers  */ 

-  SET  (  =fiMS,  *IXOR,  *1#MT  ]; 

-  /*set  $SAR  to  put  output  in  correct  place  */ 

-  LDR  Out__Data  »>  $SAR; 

-  /*load  vector  lengths  into  parameter  register 

•  $NMPT  •  Coef_Leng€h,  $REPEAT  «  Out_Length 

.  SHLSETR; [SHIFT«18]  Coef  Length  »>  $PR; 

«  AODR  SPR,  Out^Length;  “ 

-  /*load  complex  coniugate  of  coefficients  */ 

-  LD_*C:($NMM)  *CoerflciontB  «>  $C0; 

-  /*correlate  with  input  sequence  */ 

-  FIR^C; (SNKPT,SREPEAT)  $C0;  *In_Data; 

-  i/ 

-  > 
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/*Routine  to  compute  a  IK  complex  FFT  using  four  VSP  chips. 

Operation  requires  two  phases  of  operation »  one  to  calculate  column 
F^s,  the  other  to  calculate  row  FFTs  with  twiddle  factors.  The 
column  phase  can  be  performed  by  calling  the  FFT32COL  routine  iust 
as  for  a  32x32  2D  FFT.  The  routine  for  the  row  phase  differs  between 
chips  because  the  twiddle  factors  required  are  different.  This 
program  can  generate  all  four  routines  by  running  it  with  different 
settings  for  the  macro  CHIP. 

Using  multiple  VSP  chips  requires  synchronization  between  phases  so 
that  data  can  be  exchanged.  This  can  be  provided  by  a  VSP  routine 
that  synchronizes  between  calling  FFT32COt  and  FFTl^.  or  the  68020 
can  invoke  the  first  phase  and  wait  for  it  to  finish  before  invoking 
the  second. 

Each  chip  should  be  passed  an  input  pointer  to  row  (CHIP  *  8)  with 
CHIP  equalling  0,  1,  2t  or  3,  depending  on  the  chip.  The  output 
pointer  should  be  to  column  (CHIP  *  8)  since  the  results  must  be 
transposed  to  convert  column  and  row  bit-reversals  into  an  overall 
bit-reversal.  Bach  chip  handles  the  8  rows  (turning  into  columns) 
starting  at  that  point.  Adding  a  parameter  to  give  the  number  of 
rows  or  columns  to  do  would  allow  the  same  routine  to  be  used  by  1 
or  2  chips  without  needing  to  make  multiple  calls. 

The  parameter  In  Data  points  to  the  input  vector.  The  output  vector 
is  placed  at  Out^ata.  The  operation  cannot  be  performed  in  place 
because  of  the  needed  transpose.  The  column  pass  can  be  performed 
in  place  to  avoid  needing  a  buffer  area  for  the  intermediate  results. 

To  get  an  inverse  FFT^ just  change  the  subroutine  name  and  change  the 
FFT  instructions  to  IFTT  Instructions. 

*/ 

/*chlp  number  */ 
fdefine  CHIP  0 

/* function  name  for  this  chip,  change  for  each  */ 
fdefine  FUNCNAKE  FPTIKO 


zsp325( ) 

h 

/*FFT  for  rows,  eight  32  point  FFTs  with  twiddle  factors  •/ 
SUBROUTINE  FUNCNAM£( zr325ref  In^Data,  zr325ref  Out_Data) 


/*8et  up  two  RAM  sections,  swapped  by  $LC  */ 

SET  (  -INMS,  «X0R  ); 

/*8et  pointers  to  input  and  output,  compensate  for  increment  */ 
/*note;  depending  on  parameter  order  to  get  In  Data  into  $A  */ 
LOR  Out  Data  »>  T$B,  Sa) ; 

SUBR  SB7  #2; 


/*initlalize  loop  count  */ 
LDR  #7  ->  $LC  ; 


/*8tart  up  FFT  with  first  RAM  bank  */ 

LD  C:(32)  $A  «>  SCI; 

/*Tncrea8e  initial  twiddle  factor  in  RBA  by  6  rows  per  chip  */ 
FPT_C:(32)  SCI,  SRQM-(CHIP*8*16):0; 


/•loop  7  times,  XOR  with  SLC  alternates  RAM 
LD_C:(321  $A>»64  •>  SCO; 

/*UBe  RBA,  increasing  it  for  each  set  of  32 
/^increment  of  16  puts  it  at  1  on  last  pass 
FFT  C:(32)  $C0,  SRaM>-16:0; 

■:j(3_2-  . . .  ‘ 


ST  C:(32)  SCI 


LOOP:tDL:lJ  [ILZ],  #3; 


SB-f-2:(32,l)“ 


/ 

/ 

/ 


/*  save  last  RAM  bank  */ 
ST_C:(32)  SCI  «>  SB+-2; ( 32, 1)“; 


h 
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-  /*ProgreuD  to  perform  real  correlation  between  two  real  vectors  with  up  to 

•  64  elements  In  the  shorter  one  and  up  to  1024  elements  In  the  output 

-  using  a  single  zoran  processor.  Due  to  requirements  of  the  instruction 

-  used,  the  longer  real  vector  must  be  padded  at  both  ends  with  (shorter 
length  -  1)  real  zero  elements.  These  are  needed  for  when  the  shorter 

•  vector  extends  beyond  the  end  of  the  longer  during  the  operation.  If  the 

•  vectors  are  the  same  length,  either  may  be  considered  the  longer  one. 

•  The  length  of  the  short  vector  Is  passed  In  the  parameter  Coef__Length. 

•  The  length  of  the  desired  output  vector  (typically  equal  to  the  sum  of 

-  the  lengths  of  the  input  vectors,  minus  one)  Is  passed  In  Out_Length. 

•  Coefficients  points  to  the  short  Input  vector.  In_Data  points  to 

-  the  first  zero  pad  In  the  longer  Input  vector.  The  output  is  placed  at 

•  Out_Data.  The  output  data  could  be  stored  In  the  place  of  the  first 

-  input  vector  If  desired.  Typical  call  to  perform  a  full  autocorrelation 

•  In  place  with  a  64  (padded  to  190)  element  vector: 

-  dill  CCOi(R(64.  127.  fiin,  &(in-i>63},  &in) 

-  The  (ln'1’63)  skips  the  padding  at  the  front  of  the  vector. 

-  Note:  if  this  routine  will  always  be  used  for  two  equal  length  vectors, 

only  one  length  parameter  Is  needed.  The  other  can  be  computed  from  It 

-  with  some  extra  overhead.  On  the  other  hand.  If  this  routine  will  be 

-  used  repeatedly  for  the  same  length,  sending  a  precomputed  $PR  value 

-  Instead  of  a  length  would  reduce  overhead  slightly. 

-  ZBp325{ ) 

-  J# 

-  SUBROUTINE  RCORR(zr3251nt  Coef  Length, 

•  zr3251nt  Out  Length,  '' 

•  zr325ref  CoeTflclents, 

-  zr325ref  In  Data, 

•  zr32Sref  Ou^_Data} 

- 

-  SET  ( 


set  up  mode  properly,  one  RAM  section,  24  bit  Integers  */ 
RMS,  «1TOR,  -1?MT  * 


1; 


-  /*8et  $SAR  to  put  output  In  correct  place  */ 

-  LDR  Out^^ata  ■>  $SAR; 


/*to  get  real  coefficients  In  zig-zag  order,  need  to  load  half 
as  many  (rounded  up)  "complex"  coefficients 

SHLSETR: (SHIPT-171  Coef  Length  ->  $PR; 

ADDR  $PR,  #0x020000; 

/*load  coefficients  In  zig-zag  real  order  */ 

LD_C:($NMPT)  "Coefficients  »>  $C0; 


_ _ igthfi  _ , 

-  JNMPT  «  Coaf_Length,  SREPEA' 


Out 


)r  reals 
.Length 


SHLSETH:  (SBIPT-18]  Coef  Length  =■>  $PR; 
ADDS  SPR,  Out_Length;  ~ 

/•correlate  with  Input  sequence  •/ 
F1R_R:(SNMPT,SRBPEAT)  $Z0,  •In_Data; 

i/ 

> 
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•  /*Te8t  progran  to  eee  if  Zorans  work  */ 

•  /*ab8olute  base  addressea  froa  oeipory  nap  */ 
tdefine  PRAM  0x00000 

tdefine  FOUR  PORT  0x20000 

•  tdefine  STATas.tATCH  0x40000 

-  ZBp325() 

-  Int  i; 

-  float  x; 

-  /*out  a  vector  of  (1.0^  x)  at  PRAM  ♦  0x400  */ 

-  /# 

>  .0R6(PRAM  *  0x400) 

-  for  (1  -  0,  X  «  0.0;  i  <  16;  1++,  x  +>  1.0) 

-  .DATA{  1.0,  lEBE_Ploat(x)  }; 

-  #/ 

-  /*put  a  vector  of  (x#  1*0)  at  F0UR_P0RT  */ 

-  /# 

-  .ORGFOUR^PORT 

-  for  (i  -  0,  X  -  0.0;  1  <  16;  1++,  x  t-  1.0) 

-  .DATA)  IEEE  Float(x),  1.0  ); 

-  #/ 

;  > 

-  /# 

-  .ORQ 

-  SUBROUTINE  MAIN)) 

-  0  to  status  latch  */ 

-  LDR  iO  ->  SX:  _ 

-  3TR  5X  ->  STATUS_IATCH; 

-  /*add  two  complex  vectore  and  etore  */ 

-  LO  C:(162  (PRAM  ♦  0x400)  *>  $C0; 

-  ADO  C:(l6)  POUR  PORT,  $C0  »>  $C0; 

-  ST_C:(l6)  SCO  »5  POUR_PORT; 

-  /•make  sure  we  are  finished,  then  nrrlte  la  to  status  latch  */ 

-  SYNC: 7cu,EU,MU]; 

-  LDR  #5  »  $X;  _ 

-  STR  $X  STATUS  LATCH; 

:)/ 

-  > 


Pile:  B:TEST1.ASM 
Last  Modified:  Tuo  Jun  30  14:48:46  1992^_ 


I 

\ 


Page:  1 


I 


•  /*Teat  prograa  to  makm  Zoran  status  bits  follow  68020  bits  */ 

-  /^absolute  base  addresses  from  neinory  map  */ 

•  fdefine  PRAM  0x00000 

•  fdefine  FOUR  PORT  0x20000 

>  fdefine  STATQS^LATCB  0x40000 

-  Z8p325() 


SUBROUTINE  MAIR( ) 

^op: : 

LOR  STATUS  LATCH  ->  $LC; 

STR  $LC  »>“STATU5_LATCH; 

Loop: : 

X0Rfi:[TR]  STATUS  LATCH,  $LC  *>  SX; 
ANDR  SX; 

JMPC:[IE:0]  (ZR],  Loop; 

JMP  Top; 

h 

} 


Pile:  B:STATUS.ASM 

Last  Modified:  Mon  Jun  29  16:10:12  1992 


I 


10  ro  tS>  lO  M  lO  lo  lO  K>  H*  l-»  H*  M  M  M  M  M 

CD  <0  OtlJi  4^  U)  lO  M  O  ^  CO  ^O  <M./I  ^  Ui  10  O  W  vl  0^  Ui  ^  U>  to  M 


Date: 

Size: 


6/30/92 

510 


p4 1 A •  n  •  ikQM 

Last  Modified:  Tue  Jun  30  14:46:28  1992 


*  /*Code  to  start  all  VSP  chips  simultaneously.  The  start  address  of  the 

-  code  to  be  executed  at  the  signal  should  be  the  first  value  on  the 

*  stack. 

-  */ 

-  /*ab8olute  base  addresses  from  memorv  map  */ 

-  fdeflne  PRAM  0x00000  ^ 

-  #define  FOUR  PORT  0x20000 

*  tdefine  STATDs.LATCfi  0x40000 

«  /*8tatus  bit  value  to  Indicate  start  */ 

*  tdefine  START  2 

-  zap325() 

-  *>i 

-  SUBROUTINE  START () 

-  )*gat  mask  for  status  bit  »/ 

-  IDR  ISTART  ->  SX; 

-  Poll:: 

-  ANDR:[TR]  STATUS  LATCH,  $X; 

-  JMPC  tZRl,  Poll;- 

:t/ 

- } 


Datai  7/20/92 


Flla:  B:ALT  FFT.MH 


1  - 
2  - 

3  - 

4  - 

5  - 

6  - 
7  - 
•  - 
9  - 

10  - 
11  - 
12  - 

13  - 

14  - 

15  - 

16  - 

17  - 

18  - 

19  - 

20  - 
21  - 
22  - 

23  - 

24  . 

as  • 
26  - 

27  - 

28  . 

29  • 

30  - 

31  - 

32  - 

33  - 

34  - 

35  - 

36  - 

37  - 

38  - 

39  - 

40  . 

41  - 

42  - 

43  - 

44  - 

45  - 

46  - 

47  - 

48  - 

49  - 

50  - 


/*  Altaxnata  routlnos  to  oo^iuta  32x32  2D  coaplax  FFT  aalog  four  V8F  chipa. 

Tba  plpa_p2r  routlna  pxoducaa  Inoorract  rasulta  whan  tba  IB  (iHadlata 
axacutlon)  quallllar  la  uaad  on  Ita  aoftwara  pipallna'a  loop  Inatructlon. 
Muitavar  aachanlaa  cauaaa  thla  doaan't  aaaa  to  affact  tha  F?T  routlnaa 
tbat  uaa  tha  aaaa  quallflar.  Bowavar,  If  for  aoaa  raaaon  It  doaa  ao, 
tba  routlnaa  can  ba  rawrlttaa  to  avoid  ualns  tba  quallflar.  Juat  taking 
tba  quallflar  out  of  tba  axlatlng  ooda  will  raduoa  parforaanoa  by  around 
15%.  Tbla  la  baoauaa  tba  axlatlng  loop  ovarlapa  tba  m  Inatruotloo 
with  tba  following  atora,  tha  loop  Inatruotloo  Itaalf.  and  tha  load  In 
tba  naxt  loop  Itaratlon.  Raaovlng  tba  IB  quallflar  oauaaa  tba  loop 
Inatruotlon  to  wait  until  tba  FFT  Inatructlon  la  coaplata  and  tharafora 
pravanta  ovarlap  of  tha  FFT  Inatructlon  with  tha  loop  Inatructlon  and 
bora  laportantly,  with  tba  load  In  tha  nast  Itaratlon.  By  aovlng  tba 
*kamal*  of  tha  aoftttara  plpallna  down  ona  Inatructloo,  tha  load  aovaa 
paat  tha  loop  Inatructlon  Into  tha  currant  Itaratlco.  Tbla  allowa  tha 
load  to  ovarlap  tha  FFT  Inatructlon  avan  tbough  tha  loop  Inatructlon 
cannot.  Having  tha  kamal  down  ona  Inatructlon  raqulroo  altaratlona  to 
tha  praanbla  and  poataabla  of  tha  loop.  Sinca  tbaoa  altaratlona  cauaa 
tha  coablnatlon  of  tha  praaabla  and  poataabla  to  axacuta  two  itarationa 
Inatoad  of  ona,  tha  loop  count  aunt  ba  dacraaaad  by  two  Inataad  of  ona. 
Thin  altarnatlva  voralon  of  tha  32x32  FFT  can  ba  uoad  aa  an  axaapla  of 
tha  Bodlfloatlona  that  ara  naadod. 


aap32S( ) 

( 

/I 

/*  FFT  for  rowa,  alght  32  point  FFTa  on  aoqnantlal  data  •/ 
SUBROUnilB  FFT32RaH(xr32Sraf  In_Data,  ar325raf  Out_Data) 

( 


/*  oat  up  two  RAM  aactlcna,  awappod  by  $LC  ■/ 

BBT  (  •INIS,  •XOR  ]; 

/*  oat  polntarw  to  Input  and  output,  ccaponaata  for  Incraaaut  •/ 

/*  nota:  dapandlng  on  paroaotar  ordor  to  gat  In_Data  Into  $A  */ 
LDR  Out_Data  ->  ($B,  $A]; 

SUBR  SB,  164; 

/*  Inltlallxa  loop  count  */ 

LDR  *6  •>  SLC; 

/*  praaabla  ■/ 

LD_Ci(32)  SA  ->  SCI; 

FFT_Ci(32)  SCI; 

U>_Cs(32)  $A*>64  •>  SCO; 

/*  loop  6  tlooa,  XOR  with  SLC  altomataa  RAN  •/ 


Pogai 


1 


Fllat  B>ALX_m. 


51  - 

FFT_Cl<32)  SCO; 

52  - 

■T_Cl(32)  sex  •»  |■♦•64l (1,32~)I 

53  - 

LI>_C:(32)  $S+«64  ->  SCI; 

54  * 

uicrum.]  fiLZ],  13; 

55  - 

56  - 

/•  poataabla  •/ 

57  - 

FFT_C:(32)  SCO; 

58  • 

8I_Ci(32)  sex  -»  SB+-6*I(X,32“); 

59  - 

•I_Cl(33)  SCO  ->  SB*-44i(X,32")/ 

60  - 

61  * 

> 

62  - 

63  * 

/• 

FFT  for  OOXUMIB.  alflht  32  point  FTTa  on  XntarXaavad  data  */ 

64  - 

aUBROtrnn  r»T3200L(«r3aSr«f  u32Sraf  Out_Data) 

65  - 

< 

66  - 

/•  aat  up  two  I»H  aactiona,  awappad  by  StC  »/ 

67  - 

BBT  (  -ima.  'XOR  I; 

68  - 

69  - 

/•  aat  polntara  to  data,  coapanaata  for  flrat  ineraaant  */ 

70  - 

/»  notai  dapandlna  on  paraaatar  oxdar  to  pat  In_Data  Into  1 

71  - 

LDR  Out^Oata  •>  (SB,  SA]; 

72  - 

SUBR  SB,  t2; 

73  - 

74  - 

/•  InltiaXlxa  Xoop  count  »/ 

75  - 

U)R  16  ->  SM  ; 

76  - 

77  - 

/•  praaabla  */ 

78  <- 

UJ_C«(32)  $Ai(32,X)  ->  sex; 

79  - 

PFT_Ct(32)  sex; 

80  * 

U)_CI(32)  S**-2|(32,X)  ->  SCO; 

81  - 

82  - 

/*  Xoop  6  tXaaa,  XOR  wltb  SI>C  altarnataa  RAM  */ 

83  - 

rrr_ci<32)  sco; 

84  - 

ar_Ci(32)  sex  •>  SB«-2I(32,X)”; 

85  - 

U>_Ci(32)  $A*>2i(32,X)  ->  SCX; 

86  - 

UXVi(0L|  (lUl,  13; 

87  - 

88  - 

/•  poataabla  •/ 

89  - 

PFT_Ci(32)  SCO; 

90  - 

Sr_C!(32)  sex  ->  5B+-2:(32,X)~; 

91  - 

ST_C:(32)  SCO  ->  S8f2«(32,X)~; 

92  - 

93  - 

) 

94  - 

♦/ 

95  -  ) 

Data I  7/20/92 


rlla:  BsCOONV.UN 


1  -  /*  VFB  cx>da  for  oonvolutlaa  of  a  aaaplax  aaquanoa  of  up  to  32  polnta  with 

2  -  anotbar  longar  ooaplaK  aaquanoa,  produolno  up  to  3024  eutputa.  Thla 

3  -  alza  can  ba  dona  wltb  a  alngla  FIR  Inatructlon.  Thla  coda  can  ba 

4  -  callad  rapaatadly  on  a  alngla  procaaaor  to  bandla  convolutlona  wbara 

5  -  Boxa  than  1024  output  polnta  axa  raquliad  aa  long  aa  tba  abortar 

6  -  aaquanca  la  atlll  no  aora  than  32  polnta.  Howavar,  a  dlffarant  xoutlna 

7  •  daalgnad  fo*  a  longar  convolution  would  ba  aora  afflclant.  Thla 

8  -  aaaa  coda  can  ba  uaad  on  aultlpla  VHP  ctalpa  alaultanaoualy  to  glva 

9  -  a  oonaldarabla  apaad  Incxaaaa.  Thara  aay  ba  no  banaflt  to  axacutlng 

10  -  on  aora  than  ona  VSP  chip  par  bua  alnca  tba  FIR  Inatructlon  aay  not 

11  -  glva  up  tba  bua  batman  output  polnta. 

12  - 

13  -  To  gat  a  full  convolution  of  tba  Input  ragulraa  padding  both  anda  of 

14  -  tba  longar  Input  aaquanca  with  a  nuabar  of  co^ilaa  xaroaa  aqual  to  tba 

15  -  langtb  of  tba  abortar  aaquanoa  alnua  ona.  Thla  la  raqulrad  In  ordar 

16  -  to  axpllcltly  provlda  tha  xaroaa  that  ara  aaauaad  to  ba  aultlpllad  by 

17  •  alaaanta  of  tha  abortar  aaquanoa  tbat  axtand  bayond  tba  anda  of  tba 

18  -  longar  ona  during  tha  convolution  procaaa.  Tha  langtb  of  tba  output 

19  -  aaquanoa  ataould  ba  aqual  to  tba  aua  of  tba  langtba  of  tba  (unpaddad) 

20  -  Input  aaquanoaa  alnua  ona.  If  a  olroular  oonvolutlon  la  daalrad 

21  -  Inatoad  of  a  llnaar  ona,  tha  xaro  padding  ahould  ba  raplaoad  with 

22  -  polnta  froa  tba  otbar  and  of  tba  Input  aaquanca. 

23  - 

24  •  Tha  abortar  Input  langtb  la  paaaad  In  Coaf_Langtb.  Tha  output  langtb 

25  -  (aqual  to  Input  langtb  bafora  padding  plua  coafflclant  langtb  alnua  ona) 

26  -  la  paaaad  aa  Out_Langth.  Coafflclanta  polnta  to  tba  abortar  aaquanca. 

27  -  In_Data  polnta  to  tha  atart  of  tba  longar  aaquanoa  (poaalbly  a  xaro 

28  -  pad).  Tha  output  la  placad  at  out^Data.  Typical  calli 

29  -  OOI.  CCnHV(4,  1024,  4Coaf,  61n,  60ut) 

30  • 

31  -  Tha  oonvolutlon  can  ba  parfoiaad  In  placa  wltb  careful  cbolcaa  of 

32  -  paraaatar  valuaa.  If  tba  convolution  raqulroa  aultlpla  calla  on  a 

33  -  alngla  vsP  chip,  tba  output  auat  bagln  at  tba  flrat  location  of  tba 

34  -  long  Input.  Thla  avolda  ovarwrltlng  Inputa  tbat  will  ba  naadad  for 

35  -  tba  naxt  call.  Bowavar,  If  aultlpla  cblpa  ara  balng  uaad,  tha  output 

36  -  auat  ovaxwrlta  tha  laat  Input  uaad  In  ita  ooixputatlon.  Thla  worka 

37  -  baoauaa  tba  V8F  chip  baa  already  read  tba  Input  Into  Intamal  RAN 

38  -  for  furtbar  uaa.  It  la  nacaaaary  bacauaa  tbat  Input  la  tba  flrat 

39  -  ona  which  will  not  ba  naadad  by  tba  chip  working  on  tba  pravloua 

40  -  portion  of  tha  convolution.  Bona  furtbar  cara  la  naadad  In  tba 

41  *  Initial  atartup  of  In-placa  aultlpla  chip  convolution  to  anaura  tbat 

42  -  a  chip  doaa  not  wrlta  ovar  any  Input  valuaa  bafora  tha  aubaaquant 

43  -  chip  raada  tbaa  In.  A  aultlpla  call,  aultlpla  chip  convolution 

44  -  cannot  ba  dona  In  placa  bacauaa  tba  oonatralnta  ara  contradictory. 

*3  -  Bowavar,  auoh  a  larga  data  aat  would  not  fit  Into  abarad  aaaory. 

46  - 

47  -  Bpllttlc  up  a  convolution  batwaan  IIUM_CBIPa  cblpa  would  raqulra 

48  -  aoaatbln:  Ika  tha  following  Invocation  for  chip  ranging  froa  xaro 

49  -  to  (BUM_C.  >8-1)1 

50  - 


Pagai 


Data:  7/20/92 


Flla:  B:COOHV.aaM 


51  - 

52  - 

53  - 

54  - 

55  - 

56  - 

57  - 
5B  - 

59  - 

60  - 
61  - 

62  -  */ 
63  - 


64 

- 

xap325() 

65 

- 

66 

- 

/I 

67 

- 

SUBRCXmilE  Ca0MV(  zr3251at  Coaf.Langth, 

6B 

- 

zr32Slnt  Out_liangth, 

69 

- 

zr325raf  Coafflclanta, 

70 

- 

xr32STaf  In.Data, 

71 

- 

ar325raf  Out_Data) 

72 

- 

< 

73 

- 

/*  aat  up  aoda  proparly,  oim  MM  bank,  24  bit  Intagara 

74 

- 

8BT  [  •»»,  alXOR  ,  •IFMt  ]; 

75 

- 

76 

- 

/*  aat  $aM  to  put  output  In  oorract  placa  */ 

77 

- 

liW  Out.UiSta  •>  $abR; 

76 

- 

79 

- 

/*  IKM  aat  up  langtha  for  LO  and  FIR  Inatnictlona  */ 

60 

- 

SBLSKIR:  [SFlFTalB]  Coaf_Langth  •>  $PR; 

61 

- 

ADDR  SFR,  0ut_Lai:0tb; 

62 

- 

63 

- 

/*  load  aoafflolanta  In  ravaraa  oxdar  */ 

64 

- 

BBLastR;  (eHIPT>l]  Coaf_I.ongth  •>  9A| 

65 

- 

M>0R  6A,  Coafflclanta; 

66 

- 

SUBR  $A,  12; 

87 

- 

LD_C:($mPT)  $Ai(-l,l)  •>  SCO; 

88 

- 

89 

- 

/*  convolva  with  Input  aaquanca  */ 

90 

- 

FIR_Ci($IMPT,  SRBPBAT)  SCO,  •In_Data; 

91 

- 

92 

- 

) 

93 

- 

1/ 

94 

- 

> 

CUl.  CC0ilV(CG8F_LEN,  0UT_8IZE|Chlp),  CCoaf, 

C(In  +  2*DKCA_0PFSBT(chlp)),  6 (Out  *  2*DA3:A_aFF8BI(chip) ) ); 

with  tha  daClnltlouB 

Idaflna  OUT.LBH  (IN_Un  *  OQBF.LBM  -  1) 

fdafina  DATA_0FF8BT(CBIP)  (((CHIP)  •  OUT.LBI)  /  HUM.CBIPS) 

Idaflna  OUT_SIZB(CHIP)  (I»XA_OFFaBT(CHIP'»l)  -  Dk3A_OFPSBI(CBlP) ) 

DAXK_OFFSBT  la  doublad  whan  uaad  with  polntax  paraantara  bacauaa 
aach  ccaplax  alaaant  raqulraa  two  aachlna  worda. 


Pagai 


2 


Data:  1/20/92 


rila:  B:Ca»«.AaM 


1  -  /• 
3  - 

3  - 

4  - 

5  - 

6  - 

7  - 

8  - 
9  - 

10  - 
11  - 
13  - 

13  - 

14  - 

15  - 

16  - 

17  - 

18  - 
19  - 

30  - 

31  - 
33  - 

33  - 

34  - 

35  -  •/ 


Prograa  to  parfoxa  coaplac  corralatlon  battMan  two  ccapln  vactora  wltb 
up  to  33  alaaanta  in  tha  abortar  oaa  and  up  to  1034  alaaanta  in  tba  output 
uaing  a  aingla  zoran  procaaaor.  Dua  to  raquiraaanta  of  tba  Inatjnictlon 
uaad,  tba  longar  coaplax  vaetor  auat  ba  paddad  at  both  anda  with  (abortar 
langtb  -  1)  coaplax  taro  alaaanta.  Tbaaa  ara  naadad  for  wban  tba  abortar 
vaotor  axtanda  bayond  tba  and  of  tba  longar  during  tba  oparatloo.  If  tba 
vaotora  ara  tba  aaaa  langtb,  altbar  any  ba  conaldarad  tba  longar  ona. 

Tba  langtb  of  tba  abort  vaetor  la  panaad  in  tba  paraaatar  Coaf_Laogtb. 
nia  langtb  of  tba  daalrad  output  vaetor  (typically  aqual  to  tba  oua  of 
tba  langtba  of  tba  input  vactora,  alnua  ona)  ia  paaaad  in  Out_Langtb. 

Sba  abort  input  vaetor  la  polntad  to  by  Coafflclanta.  Tba  paraaatar 
ln_Data  polnta  at  tba  flrat  xaro  pad  of  tba  longer  input  vaotor. 

Tba  output  la  placed  at  Out_Data.  Tba  output  data  could  ba  atorad  in 
tba  plaoa  of  tba  flrat  input  vaetor  if  daalrad.  Typical  call  to  parfora 
a  full  autooorralatlon  In  place  wltb  a  33  (paddad  to  94)  alaaant  vactori 
CblL  COOI«(33,  63,  Cln,  t(lnt31),  6ln) 

Tba  (lnt31)  nklpn  tba  padding  at  tha  front  of  the  vector. 

■ota:  If  thla  routine  will  alwaya  ba  uaad  for  two  aqual  langtb  vactora, 
only  ona  length  parnaatar  ia  naadad.  Tba  otbar  can  ba  ccaputad  frea  it 
%rltb  noaa  extra  ovarbaad.  On  tha  other  hand.  If  thla  routine  will  >>a 
uaad  rapaatadly  for  tba  aaaa  length,  aandlng  a  pracaaq>utad  $PR  value 
Inataad  of  a  length  would  reduce  overhead  allghtly. 


36  - 

27  -  (ap335() 

28  -  < 

29  - 

/* 

30  - 

SUBNOUnm  caoRi«(  arllSlnt  Caaf_Ungtb, 

31  - 

ar3351nt  Out .Length, 

32  « 

ax335raf  Coafflclanta, 

33  - 

8r33Sraf  In.Data, 

34  - 

ar335rof  Out.Oata) 

35  - 

{ 

36  - 

/• 

aat  up  aoda  properly,  ona  RbN  aaotlon,  34  bit 

37  - 

SR 

(  HMS,  -iXOR,  •imc  ]; 

38  - 

39  - 

/• 

aat  $aAR  to  put  output  In  oorroct  placa  */ 

40  - 

LOR 

Out_Data  •>  JSbR; 

41  - 

42  - 

/• 

load  vector  langtba  Into  paraaatar  raglatar 

43  - 

$imPT  •  Coaf .Length,  SRBPUS  -  out.Langtb 

44  - 

•/ 

45  - 

SBLSRRt  (BBIFT*18]  Caaf_Langtb  ■>  $PR; 

46  - 

bODR  $PR,  Out_Iiangtb; 

47  - 

48  - 

/• 

load  coaplax  oonjugata  of  coafflclanta  */ 

49  - 

U>_*Cs($MR)  ‘Coafflclanta  •>  $C0; 

50  - 

/ 


Pagai 


1 


Data I  1/20/92 


riisi  Bicoom.MM 


51  ~  /*  corralata  with  Input  aaquanca  */ 

52  -  FII^C)  ( SIMPS, SBEPBAT)  SCO,  •In.Data; 

53  -  ) 

54  -  1/ 

55  -  > 


OaM<  7/20/92 


riiai  BirmK.ASM 


I 


1  - 

2  -  /• 

3  - 

4  - 

5  - 

6  - 
7  - 
B  - 
9  - 

10  - 
11  - 
12  - 

13  - 

14  - 

15  - 

16  - 

17  - 

18  - 

19  - 

20  - 
21  - 
22  - 

23  - 

24  - 

25  - 

26  - 

27  - 

28  - 

29  - 

30  - 

31  - 

32  - 

33  - 

34  - 


Boutina  to  coaputa  a  IK  coaplax  FFT  ualng  four  VSP  chlpa. 

Oparatlon  raquiraa  ttn  phaaaa  of  operation,  one  to  calculate  coluan 
FTTa,  tha  other  to  calculate  row  PFTa  with  twiddla  factora.  Tha 
ooluen  phaaa  can  be  parforaad  by  calling  the  ryT32COI.  routlna  juat 
aa  for  a  32x32  2D  m.  ma  routlna  for  tha  row  phaaa  dlffara  batwaan 
ohlpa  bacauaa  tha  twiddla  factora  ragulrad  an  dlffarant.  Thla 
prograe  can  ganarate  all  tour  routlnaa  by  running  It  with  dlffarant 
aattlnga  tor  tha  aacro  CBIF. 

Ualng  aultlpla  V8P  chlpa  raquiraa  aynchronlzatlon  batwaan  phaaaa  ao 
that  data  can  bo  axchangad.  Thla  can  be  provided  by  a  VSP  routine 
that  aynchronlzaa  batwaan  calling  FFTiacOL  and  FRlKn,  or  tha  68020 
can  Invoka  tha  flrat  phaaa  aixl  wait  for  It  to  flnlah  before  Invoking 
the  aacond. 

Bach  chip  ahould  be  paaaad  an  Input  polntar  to  row  (CHIP  *  8)  with 
(SIP  equalling  0.  1.  2.  or  3.  dapandlng  on  tha  chip.  The  output 
polntar  ahould  be  to  coluan  (CHIP  •  8)  alnca  the  raaulta  auat  be 
tranapoaad  to  convert  coluan  and  row  blt-ravaraala  Into  an  overall 
blt-ravaraal .  Bach  chip  handlaa  tha  8  rewa  (turning  Into  ooluna) 
atartlng  at  that  point.  Adding  a  paraaatar  to  give  tha  nuabar  of 
rowa  or  ooluana  to  do  would  allow  tha  aaaa  routine  to  ba  uaad  by  1 
or  2  chlpa  without  naadlng  to  aaka  aultlpla  calla. 

paraaatar  ln_0ata  polnta  to  tha  Input  voctor.  The  output  vector 
la  placad  at  Out_Data.  The  oparatlon  cannot  ba  parforaad  In  place 
baoauaa  of  tha  naadad  tranapoaa.  Tha  coluan  paea  can  ba  parforaad 
In  plaoa  to  avoid  naadlng  a  buffer  area  for  tha  Intaraadlata  raaulta. 

Qet  an  Invaraa  PPT,  juat  change  tha  aubroutlna  naaa  and  change  tha 
FFT  Inatructlona  to  IFFT  Inatructlona. 


36  - 

37  /*  chip  nuabar  */ 

38  -  fdaflne  CBIP  0 

39  - 

40  -  /•  function  naaa  for  thla  chip,  change  for  each  •/ 

41  -  fdaflna  FUHCMAMB  FFTIKO 

42  - 

43  - 

44  - 

45  -  xap325() 

46  -  ( 

47  -  /I 

lowa,  eight  32  point  FFTa  with  twiddla  factora  •/ 
**  ■  BUBHOOnHB  FUMCIUMB( Kr32Sraf  In  Data,  u325rat  Out  Data) 

50  -  ( 


Vagai 


1 


Data:  7/20/92 


Fila:  B:FFTllC.AaM 


51  - 

52  -  /*  aat  up  two  RAM  aactlona,  avwppad  by  $LC  */ 

53  -  SBT  [  -ims,  -xoR  1: 

54  - 

55  -  /*  aat  polntara  to  Input  and  output,  coaqionaata  for  Incraaant  */ 

56  -  /•  notai  dapandlng  on  paraaatar  onlar  to  gat  In_Data  Into  $A  */ 

57  -  UIR  Out_Data  ->  [$B,  SA]; 

SB  -  SUBR  $B,  12; 

59  - 

60  -  /*  Inltlallsa  loop  oount  */ 

61  -  UR  «7  ->  $LC  I 

62  - 

63  -  /■  atart  up  FFT  with  flrat  RAM  bank  */ 

64  -  LD_Ci(32)  $A  ->  SCI; 

65  -  /*  Incraaaa  Initial  twlddla  factor  In  RBA  by  8  rowa  par  chip  */ 

66  -  FFT_C|(32)  $C1,  $RQM-(CBIP«8*16)  tO; 

67  - 

68  -  /*  loop  7  tlaaa,  XDR  %flth  SIiC  altamataa  RAM  •/ 

69  -  10_Ci(32)  $A>-64  •>  SCO; 

70  -  /•  uaa  RBA,  Incraaalng  It  for  aaoh  aat  of  32  */ 

71  -  /•  Incraaant  of  16  puta  It  at  1  on  laat  paaa  */ 

72  -  FFT_Ct(32j  SCO,  SROH-oia.-O; 

73  -  flT_Ci(32)  SCI  •>  $B»-2«(32,1)~; 

74  -  L0(X>|(01.;11  [ILZ],  13; 

75  - 

76  -  /•  aava  laat  RAM  bank  */ 

77  -  aT_C;(32)  SCI  ->  SB*«2:(32,1)~. 

78  - 

79  -  ) 

80  •  1/ 

81  -  ) 


Dmtmt  7/20/92 


Fil«:  B:FFTlK0.AaM 
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46  - 

47  - 

48  - 

49  - 

50  - 


/*  Routin*  to  coaputo  a  IK  oaaplmx  FFT  using  four  vap  chips. 

-  Opsrstion  rsquirss  two  phssss  of  operation «  ona  to  calculate  colusn 

-  FFTs,  the  other  to  calculate  row  FTTa  with  twiddle  factors.  The 

*  ooluen  phase  can  be  perforeed  by  calling  the  PFT32COIi  routine  just 

-  aa  for  a  32x32  2D  FFT.  Ttaa  routine  for  the  row  phase  differs  between 

-  chips  beoauae  the  twiddle  factors  required  are  different.  This 

-  progi  ae  can  generate  all  four  routines  by  running  it  with  different 

-  settings  for  the  eaoro  CHIP. 

-  Using  Bultipla  VSP  chips  requires  synchronization  between  phases  so 

*  that  data  can  be  exchanged.  This  can  be  provided  by  a  VSP  routine 

-  that  synchronizes  between  calling  FFT32COL  and  or  the  68020 

>  can  Invoke  the  first  phase  and  wait  for  it  to  finish  before  Invoking 

-  the  second. 

-  Baoh  chip  should  bo  passed  an  input  pointer  to  row  (CaiP  *  8)  with 

->  CSIP  equalling  0,  1«  2,  or  3«  depending  on  the  chip.  The  output 

-  pointer  should  be  to  ooluen  (CHIP  •  8)  ainoe  the  reeulta  oust  be 

-  transpoeed  to  convert  coluan  and  row  blt*ravorsala  into  an  overall 

-  bit-reversal.  Bach  chip  handles  the  8  rows  (turning  into  ooluens) 

*  starting  at  that  point.  Adding  a  paraaater  to  give  the  rueber  of 

*  rows  or  coluens  to  do  would  allow  the  sees  routine  to  be  used  by  1 

-  or  2  chips  without  needing  to  eake  eultlple  calls. 

-  The  paxaeeter  In_Data  points  to  the  input  vector.  The  output  vector 

*  is  placed  at  Out.Data.  The  operation  cannot  be  perforeed  in  place 

*  because  of  the  needed  transpose.  The  oolusn  peas  can  be  perforeed 

-  in  place  to  avoid  needing  a  buffer  area  for  the  Intereedlate  results. 

*  To  get  en  inverse  FFT«  just  change  the  subroutine  naea  and  change  the 

*  FFT  instructions  to  ZFFT  instructiaos. 


-  •/ 


/*  chip  nueber  */ 
tdefine  CHIP  0 

/*  function  neee  for  this  chip,  change  for  each  •/ 
fdafine  FUHCHAMB  FFTIKO 


xap325() 

< 


/# 

/*  FFT  for  rowB,  eight  32  point  FPTa  with  twickile  factors  •/ 
flUBBOOTIBB  FUHCIIAMB(zr325raf  In^Data,  u325ref  Out_Da  a) 

< 
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51  - 

52  - 

53  - 

54  - 

55  - 
36  - 

57  - 

58  - 

59  - 

60  - 
61  - 
63  - 

63  - 

64  - 

65  - 

66  - 

67  - 

68  - 

69  - 

70  - 

71  - 

72  - 

73  - 

74  - 

75  - 

^  76  - 

77  - 

78  - 

79  - 

80  - 
81  -  ) 


I*  Mt  up  two  MM  aactlons,  aMppail  by  $LC  •/ 

S8T  I  -IMMS,  "XOR  ]; 

/*  Ht  polntan  to  Input  and  output,  coopanaata  for  Incraaant  */ 

t*  notai  dapandlno  on  paraaatar  ordar  to  gat  ln_Data  Into  5A  */ 
LOP  Out_Data  •>  [SB,  $A]; 

8UBR  SB,  12} 

/*  Inltlallaa  loop  count  ■/ 

LOn  17  »  $LC  : 

/*  atart  up  FFT  with  flrat  KAN  bank  */ 

LD_C;(32)  SA  ->  SCI; 

/*  Incraaaa  Initial  twlddla  factor  In  KBA  by  8  rowa  par  chip  */ 
FFT_C:(32)  SCI,  $RaM-(CBlP*8*16) :0; 

/*  loop  7  tlaaa,  XQK  with  SLC  altarnataa  KAN  */ 

LD_Ct(32)  SA*«64  ->  SCO; 

/*  uaa  KBA,  Incraaalng  It  for  aach  aat  of  32  */ 

/*  Incraaant  of  16  puta  It  at  1  on  laat  paaa  */ 

FPT_C|(33)  SCO,  SKOM»16iO; 

8T_Cl(32)  SCI  ->  SB*-2l(32,l)*; 

L0a?i[IB,DL]  [ILZ],  «3; 

/*  sava  laat  KAM  bank  •/ 

8T_Cl(32)  SCI  •>  8B*-2i(32,1)"; 

) 

»/ 


Oata<  7/20/92 


Filai  B:F?TlKl.*a( 


1  - 

2  -  /*  Boutina  to  ocaputa  a  IK  coaplax  FFT  using  four  V8P  ctalpa. 

3  - 

4  -  Operation  requires  two  phaaaa  of  operation,  one  to  calculate  coluan 

5  -  FFTa,  tha  other  to  calculate  row  FFTa  with  twiddle  factora.  The 

6  -  coluvi  phase  can  he  perforeed  by  calling  the  FFT32GOL  routine  just 

7  -  as  for  a  32x32  2D  FFT.  The  routine  for  tha  row  phase  differs  betw^n 

a  -  chips  baoauaa  the  twiddle  factora  required  are  different.  This 

9  -  prograe  can  generate  all  four  routines  by  running  It  with  different 

10  -  settings  for  tha  eaoro  CHIP. 

11  - 

12  -  Using  Bultlpla  VSP  chips  requires  synchronization  batonaan  phases  so 

13  -  that  data  can  be  exchanged.  This  can  be  provided  by  a  VHP  routine 

14  •  that  aynchronlzsa  between  calling  FFT320aL.  and  FFTlKn,  or  the  £9020 

15  -  can  Invoke  the  first  phase  and  wait  for  It  to  finish  before  Invoking 

16  -  the  second. 

17  - 

18  -  Bach  chip  should  be  passed  an  Input  pointer  to  row  (CHIP  *  8)  with 

19  -  CHIP  equalling  0,  1,  2,  or  3,  depending  on  tha  chip.  Tha  output 

20  -  pointer  should  be  to  coluen  (CHIP  *  6)  since  tha  results  auat  be 

21  -  transposed  to  oonvert  ooluon  and  row  bit-reversals  Into  an  overall 

22  -  blt-ravaraal .  Bach  chip  handles  tha  8  rows  (turning  Into  ooluana) 

21  -  starting  at  that  point.  Adding  a  paraaatsr  to  give  tha  noabar  of 

24  -  rows  or  coluna  to  do  would  allow  the  sane  routine  to  be  used  by  1 

25  -  or  2  chips  without  needing  to  sake  aultlpla  calls. 

26  - 

27  -  The  pacaaater  In_Data  points  to  tha  Input  vcwtor.  The  output  vector 

28  -  la  placed  at  Out_Data.  The  operation  cannot  be  psrforasd  In  place 

29  -  because  of  tha  needed  tranapoaa.  Tha  coluan  pass  can  be  perforaad 

30  -  In  place  to  avoid  needing  a  buffer  area  for  tha  Interned late  results. 

31  - 

32  -  To  got  an  Invoraa  FFT,  just  change  tha  subroutine  naas  and  change  tha 

33  -  FFT  Instructions  to  IFFT  Instructions. 

34  - 

35  -  •/ 

36  - 

37  -  /*  chip  nuabar  */ 

38  -  Idoflns  CHIP  1 

39  - 

40  -  /*  function  naaa  for  this  chip,  change  for  each  */ 

41  -  f define  FUHCNAMB  FFTIKI 

42  - 

43  - 

44  - 

45  -  sap325() 

46  -  ( 

47  -  /I 

48  -  /*  FFT  for  rows,  eight  32  point  FFTa  with  twiddle  factors  */ 

49  -  BUBHOUTIHB  FUMCIUKB(zr325rsf  In_Data,  zr325raf  Out.Data) 

50  -  ( 
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Flla:  B:nTUC2.MM 
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36  - 

37  - 

38  - 
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42  - 

43  - 

44  - 

45  - 

46  - 

47  - 

48  - 

49  - 
90  - 


-  •/ 


Boutina  to  ooaputa  a  IX  coaplax  F7T  ualng  four  VSF  chlpa. 

Oparatloa  mqulraa  two  pbaaaa  of  oparatlon,  ooa  to  calculata  coluan 
FFTa,  tt>a  otbar  to  oaloulata  row  FfTa  with  twlddla  factora.  Tba 
ooluan  phaaa  oan  ba  parforaad  by  oalllng  tba  FFT32COL  routlna  juat 
aa  for  a  32x32  20  FFT.  Tba  routlna  for  tba  row  pbaaa  dlffara  batwaan 
ohlpa  bacauaa  tba  twlddla  factora  raqulrad  ara  dlffarant.  Thla 
prograa  oan  ganarata  all  four  routlnaa  by  running  It  with  dlffarant 
aattlnga  for  tba  aacro  CBIF. 

Ualng  aultlpla  VHP  oblpa  raqulraa  aynchronlzatlon  batwaan  pbaaaa  ao 
that  data  can  ba  axchangad.  Thin  can  ba  provldad  by  a  VSP  routlna 
that  aynchzonlxaa  batwaan  calling  FFT32CaL  and  FFTlXn,  or  tba  68020 
can  Invoka  tba  flrnt  phaaa  and  wait  for  It  to  flnlah  bafora  Invoking 
tba  aacond. 


Each  chip  ahould  ba  paaaad  an  Input  polntar  to  row  (CHIP  *  8)  with 
CHIP  aqualllng  0,  1,  2,  or  2,  dapandlng  on  tba  chip.  n>a  output 
polntar  ahould  ba  to  coluan  (CHIP  *  8)  alnca  tba  raaulta  auat  ba 
tranapoaad  to  oonvart  coluai  and  row  blt-rawaraala  Into  an  ovarall 
blt-ravaraal.  Baoh  oblp  handlaa  tba  8  rown  (turning  Into  ooluana) 
atartlng  at  that  point.  Adding  a  parnaatar  to  glva  tba  nuabar  of 
rowa  or  ooluaia  to  do  would  allow  tba  aaaa  routlna  to  ba  uaad  by  1 
or  2  oblpa  without  naadlng  to  anka  aultlpla  oalla. 

Tba  paraaafar  InJData  polnta  to  tba  Input  vactor.  Tba  output  vactor 
la  plaoad  at  out_0ata.  Tba  oparatlon  cannot  ba  parforaad  In  plaoa 
baoauaa  of  tba  naadad  tranapoaa.  Tba  ooluan  paaa  oan  bo  parforaad 
In  plaoa  to  avoid  naadlng  a  buffar  aroa  for  tba  Intaraadlata  raaulta. 

To  gat  an  Invaraa  FFT,  Juat  changa  tba  aubroutlna  naaa  and  cbango  tba 
FFT  Inatructlona  to  IFFT  Inatructlona. 


/*  chip  nuabar  */ 
tdaflna  CHIP  2 

/*  function  naaa  for  thla  chip,  cbango  for  oach  */ 
Idaflna  FUMCMAMB  FFT1X2 


■ap325() 

I 


/I 

/*  FFT  for  rowa,  algbt  32  point  FFTa  with  twlddla  factora  */ 
amamwt  FUBCMMB(u325raf  In_Data,  zr325rof  Out.Data) 

< 
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D«t«s  7/20/92 


ril«t  BsFFTUQ.AflM 


4 


I 


51  - 

52  - 

53  - 

54  - 

55  - 

56  - 

57  - 

58  - 

59  - 

60  - 
61  - 
62  - 

63  - 

64  - 

65  - 

66  - 

67  - 

68  - 

69  - 

70  - 

71  - 

72  - 

73  - 

74  - 

75  - 

76  - 

77  - 

78  - 

79  - 

80  - 
•1  -  ) 


mmt  lip  two  MM  •■ctlona,  nnppad  by  8LC  */ 

SET  [  -IIM8,  •XDS  ]; 

/*  aat  polntara  to  Input  and  output,  oo^anMta  for  Incru—nt  */ 

/■  notai  dapandlng  oo  paruator  ordar  to  gat  In_Data  Into  $A  */ 

LOR  Out_Data  •>  ($B,  $A]| 

8UBR  SB,  #2; 

/*  Initlallu  loop  count  •/ 

LOR  97  ->  SLC  ; 

/*  Btan  up  m  with  flrat  RM4  bank  •/ 

U>_Cl(32)  $A  •>  SCI; 

/*  Incraaaa  Initial  twlddla  factor  In  RBh  by  8  mm  par  chip  */ 
PrT.CKll)  SCI,  SRaM-(CBIP*8*16)<0| 

/■  loop  7  tlaaa,  XOR  with  SLC  altamataa  RAM  */ 

LO_Ci(32)  SA«-64  ->  SCO; 

/*  uaa  RM,  Incraaalng  It  fcr  aach  aat  of  32  */ 

/■  Incraaant  of  16  puta  It  at  1  on  laat  paaa  •/ 

FFT_C;(32)  SCO,  SRCH*-16;0; 

8T_C:(32)  SCI  ->  SB*-2:(32,1)-; 

L0aPi(IB,DL]  (ILZ],  13; 

/*  aava  laat  RM  bank  */ 
aT_0j(32)  SCI  •>  SB+"2:(32,1)*; 


> 

9/ 


I 


1 

I 
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/*  llDotliia  to  ooiputa  ■  IK  coaploK  FFT  lulng  tour  V8P  chip*. 

Oparatlon  raqulraa  two  phaaaa  of  oparatloo,  ooa  to  oalculata  coluan 
VFTb,  tha  otbar  to  caloulata  row  FFTa  wltb  twiddla  faotora.  Tba 
ooluan  pbaaa  oan  ba  parfonad  by  calling  tba  91X32001,  routlaa  juat 
as  for  a  32x32  20  FFT.  Hia  routlna  for  tba  row  pbaaa  dlffars  batwaan 
oblpa  baoaaaa  tba  twiddla  factora  raqulrad  ara  dlffacant.  Thla 
prograa  oan  ganarata  all  four  routlnaa  by  running  It  with  dlffarant 
aattlnga  for  tba  aacro  CHIP. 

Using  aultlpla  VHP  chips  raqulrss  synchronization  batwaan  phasaa  so 
that  data  can  ba  axchangad.  Thla  can  ba  provldad  by  a  VSP  routlna 
that  syncbzonlzaa  batwaan  calling  FFT32caL  and  FFTlXn,  or  tba  68020 
can  Invoka  tba  first  pbaaa  and  wait  for  It  to  flnlata  bafora  Invoking 
tha  saoond. 

Bach  ohlp  should  ba  paaaad  an  Input  polntar  to  row  (CHIP  ■  6)  wltb 
CBXP  squalling  0,  I,  2,  or  3,  dapaodlng  on  tha  chip.  Tha  output 
polntar  should  ba  to  coluan  (CHIP  *  8)  slnoa  tha  raaults  oust  ba 
transpoaad  to  oonvart  ooluan  and  row  blt-ravaraals  Into  an  ovarall 
blt-rsvarsal .  Bach  ohlp  handlaa  tha  8  rows  (turning  Into  ooluana) 
atartlng  at  that  point.  Adding  a  paraaatar  to  glva  tha  nuabar  of 
rows  or  ooluais  to  do  would  allow  tba  saaa  routlna  to  ba  naad  by  1 
or  2  chips  without  naadlng  to  aaka  aultlpla  calls. 

Tha  paraaatar  InjOata  points  to  tha  Input  vector.  Tha  output  vactor 
la  placsd  at  Out_0ata.  Tha  oparatlon  cannot  ba  parforaad  In  plaoa 
bacausa  of  tbs  naodad  tranapoaa.  Tha  coluan  pass  can  bo  parforaad 
In  plaoa  to  avoid  naadlng  a  buffar  aroa  for  tha  Intaraadlata  rasulta. 

TO  gat  an  Invarsa  FFT,  ]uat  change  tba  subroutine  anc  change  tha 

FFT  Instructions  to  IFFT  Instructions. 


/*  chip  nuabar  */ 
fdaflna  CHIP  3 

/*  function  naaa  for  this  chip,  change  for  each  */ 
Maflna  FUBCBAMI  FFT1K3 


45  -  ■sp325() 

46  -  ( 

47  -  /i 

twiddla  factors  */ 
0utJ3ata) 


48  - 

49  - 
an  - 


/*  FFT  for  rows,  sight  32  point  FFTs  with 
SUBROUTIHB  FUIK1UNB(  xr325raf  In.DaU,  zr325raf 


Fagoi 


1 


51  - 

52  -  /•  nt  up  two  MM  ■■ctlons  swappad  by  SLC  */ 

53  -  8ST  (  -IMHS,  -XDR  ); 

54  - 

55  •  /*  a«t  polntan  to  input  and  output,  uoaponaato  for  incraaant  */ 

56  -  /•  notoi  dopaodlng  on  paraooTar  ordar  to  gat  ln_Data  into  tA  •/ 

57  -  U>R  Out.Data  •>  tS>.  $A]) 

SB  -  8UBR  SB,  12; 

59  - 

60  -  /*  Initlallza  loop  count  *t 

61  -  IDR  17  ->  $LC  : 

62  - 

63  -  /*  atart  up  FFT  wltb  first  RAM  bank  */ 

64  -  LD_Cl(321  $A  ->  $C1; 

65  -  /•  Incraaaa  Initial  twlddla  factor  In  RBA  by  8  rows  par  chip  •/ 

66  ^  FFT_C:(32)  SCI,  SR0M-(CHIP*a*16)!0; 

67  - 

68  -  /•  loop  7  tlaaa,  XDR  with  $bC  altamatas  RAM  */ 

69  -  LO_Cl(32)  SA»«64  >>  SCO; 

70  -  /•  usa  RBA,  Incraaalng  It  for  aaoh  aat  of  32  */ 

71  -  /*  Inoraaant  of  16  puts  It  at  1  on  last  paaa  •/ 

72  -  Fn_Ci(32|  SCO,  SROM«>16tO; 

73  -  aT_Ci(32)  SCI  ->  SB»-2:(32,ir; 

74  -  100P|(IB,DI.l  [ILZ],  #3; 

75  - 

76  -  /•  sava  last  RAM  bank  •/ 

77  -  aT_Ci(32)  SCI  •>  8Bt-2;(32,iri 

78  • 

78  -  ) 

80  -  #/ 

81  -  > 


Vaga; 


Data  I  7/20/92 


rila:  BSFFT2D8.UM 


1-/0  Prograa  to  coaputa  BxS  2D  coaplax  FFT  ualng  ana  VSP  chip. 

2  - 

3  _  nia  In_Data  polnta  to  tha  Input  vactor.  Tha  output  vactor 

4  -  la  placad  at  Out_Data.  Tba  oparatlon  can  ha  parloiaad  In  placa  if 

5  -  daalrad.  Both  Input  and  output  vactora  ara  In  noraal  ordaz. 

<  - 

7  -  So  gat  an  invaraa  FFT,  juat  changa  tha  aubzoutlna  naaa  and  changa  tha 

8  -  FFT  Inatzuotlona  to  IFFT  Inatructlona. 

9  - 

10  -  To  uaa  zaal  data,  changa  LD_c  to  LO_(R,0). 

11  - 

12  -  Might  ha  abla  to  aquaaza  a  llttla  aora  apaad  out  by  atartlng  with 

13  -  two  BAM  aactlona,  load  flrat,  FFT  flrat  row,  load  aacond,  FFT  aacood 

14  -  rowa,  awltch  to  ona  RAM  aactlon,  FFT  coluana,  atora. 

15  - 

16  -  •/ 

17  - 

18  -  aap32S() 

19  -  { 

20  -  /• 

21  -  SUBROUniO!  FFT208(zr32Szaf  In_Data,  ar32Szaf  Out^Data) 

22  -  < 

23  -  /*  aat  up  ona  RAM  aactlon  */ 

24  -  SBT  (  -MMS,  •IXDR  ]; 

25  - 

26  -  /*  load  all  64  antclaa,  with  rowa  bit  ravaraad  ■/ 

27  -  1D_CJ(64J  •ln_nataj(8,8-)  ->  JO; 

28  - 

29  •  /*  FFT  tha  rowa,  raault  In  noraal  ozdat  */ 

30  -  FFT.Cl(8,a)l[FF8ll,LF8t4]  tO",  SRQM>0t513; 

31  - 

32  -  /*  FFT  tba  ooluBia,  raault  In  bit  ravaraad  ordar  •/ 

33  •  FFT_Ci(64)i(FF8t32,LFasB]  $0; 

34  - 

35  -  /*  atora  raault,  bit  ravaralng  ooluBia  Into  noraal  ordar  */ 

36  -  ST_Ci(64)  $0  •>  •Out_0ata;(B~,8); 

37  -  ) 

38  -  1/ 

39  -  ) 


1 


Data I  7/20/92 


rile;  B:Frr2016.»aH 


1  -  /• 

2  - 

3  - 

4  - 

5  - 

6  - 

7  - 

8  - 
9  - 

10  - 
11  - 
12  - 

13  - 

14  - 

15  - 

16  - 

17  - 

18  - 

19  - 

20  - 
21  - 
22  - 

23  - 

24  - 

25  - 

26  - 

27  - 

28  -  •/ 

29  - 

30  -  up325() 

31  -  i 

32  -  /f 

33  -  /■  FFT  for  rom,  four  16  point  FFVi  on  aaqoontlal  data  */ 

34  -  eUBfOmm  rFT16IICM((r325raf  ln_0ata,  ar325raf  Out.Data) 

35  -  < 

36  -  /•  not  up  two  MK  aaotlona,  no  oaad  for  aashanga  */ 

37  .  UT  (  •IMS,  •IXCK  ]» 

38  - 

39  -  /•  aat  up  polntara  for  latar  offaat,  $A  gata  In_Data  */ 

40  -  LDR  Out_Data  •>  (SB,  $A]; 

41  - 

42  -  /•  load  two  rows  Into  aactlon  0  */ 

43  -  LD_Cx(32)  $k  ->  SCO; 

44  - 

45  -  /•  m  aa  two  16  alaaant  FFTa  */ 

46  -  rrT_Ci(16.3)i[FP8i8,LPBtl]  SCO] 

47  - 

48  -  /*  load  riilndar  of  antrlaa  Into  aactlon  1  ■/ 

49  -  U>_Cl(32)  SBt64  •>  SCI; 

50  - 


Boutlnaa  to  ooi^o^  16x16  20  cowplax  FFT  uaing  four  VBP  chlpa. 

Oparatlon  raqulraa  two  phaaaa  of  oparatlon,  ana  to  calculata  row 
FFTa,  ttaa  othar  to  calculata  coluan  FFTa.  Uaing  aultlpla  VSP  chlpa 
raqulraa  aynohronlxatlon  batwaan  phaaaa  ao  that  data  can  ha  axchangad. 
Ihaaa  routlnan  do  not  Includa  tha  sychronlaatlon.  Tha  routinaa  for 
aach  phaaa  oan  ba  callad  froa  anothar  routlna  tihlch  provldaa  It  batwaan 
calla,  or  tha  68020  can  Invoka  tha  flrat  phaaa  and  wait  for  It  to 
flnlah  bafora  Invoking  tha  aacond. 

Bach  VSP  chip  could  calculate  Ita  four  rows  or  coluana  in  one  inatructlon, 
but  uaing  two  RAM  aactiona  allowa  non  cooeurrmey.  Bach  chip  ahould 
ba  paaaad  data  polntara  to  row  or  coluan  (CHIP  *  4)  with  CHIP  aqualllng 
0,  1,  2,  or  3,  dapandlng  on  tha  chip. 

Tha  paraaatar  ln_0ata  polnta  to  tha  Input  vactor.  Tha  output  vector 
la  placed  at  Out_Data.  Tha  oparatlon  oan  ba  parforaad  In  placa  If 
daalrad.  Both  Input  and  output  vaotora  ara  In  noraal  order. 

To  gat  an  Invaraa  FFT,  juat  ohanga  tha  aubroutlna  naaa  and  ohanga  tha 
FFT  Inatructlona  to  IFFT  Inatruotlona. 

To  uaa  real  data,  either  aat  tha  laaglnary  parta  tc>  zero  to  gat  a  ooafxlax 
vactor,  or  change  Ii)_C  to  U>_(R,0)  to  uaa  a  real  vactor.  With  a  real 
vactor,  thla  operation  cannot  bo  parfonad  in  placa,  alnce  tha  output 
data  would  overwrite  unraad  Input  data. 


Pagai 
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Data)  7/20/93 


Pilas  B:FPT2016.&a« 


51  - 

52  - 

53  - 

54  - 

55  - 

56  - 

57  - 
56  - 

59  - 

60  - 
61  - 
63  - 

63  - 

64  - 

65  - 

66  - 

67  - 

68  - 

69  - 

70  - 

71  - 

72  - 

73  - 

74  - 

75  - 

76  - 

77  - 
76  - 
79  - 
60  - 
81  - 
82  • 

83  - 

84  - 

85  - 

86  - 

87  - 

88  - 

89  - 

90  - 

91  -  ) 


/*  FFT  as  two  16  alaaant  FFTa  */ 

FFT_C:(16,2):[PPa:8,LPB:l]  $C1; 

/*  atora  first  result,  row  blt-raversad  */ 
8rr_Ci(32)  SCO  •>  $81(16, 16~); 

/*  store  second  result,  row  bit-ravarsed  */ 

aT_Cl(32)  $C1  •>  $B«64I(16,16~); 


} 

/*  FFT  for  oolueoa,  four  16  point  FFTs  cu  intarlasvad  data  */ 

SUBROUTINB  FFT1600L(zr325raf  In_Dats,  sr325raf  Out.Dsts) 

( 

/*  sat  up  two  RAM  sections,  no  need  for  axcbange  •/ 

SET  (  -(IMS,  -IXC»  }: 

/*  sat  up  pointers  for  later  offset,  $A  gats  ln_Dats  */ 

LDR  Out_Data  ->  ($B,  $A] ; 

/•  load  two  coluena  inter leavad  into  section  0  */ 

L0_Ci(12)  $At(16,2J  »  SCO; 

/■  FFT  first  eat  as  two  16  slaant  FFTs  */ 

FFT_Ct(33)l(FF8il6,LFai2)  SCO; 

/*  load  raaaindar  of  antriaa  into  section  1  •/ 

U)_C|(32)  $At4i(16,2)  •>  $C1; 

/*  FFT  as  two  16  alsaant  FFTs  */ 

FFT_C:(32)!(FPS:16,LP8:2]  $C1; 

/*  store  first  result,  ooluans  blt-raversad  */ 

8T_C<(32)  SCO  ->  $B:(16~,2); 

/*  store  second  result,  coluana  blt-raversad  */ 

»T_C;(32)  SCI  ->  SB*4j(16-,2); 

) 

#/ 


Pagai 


2 


Data:  1/20/n 


FUai  B:FFT2D32.AaM 


1  - 
3  - 

3  - 

4  - 

5  - 

6  - 
7  - 
e  - 

9  - 
10  - 
11  - 
13  - 

13  - 

14  - 

15  - 

16  - 
17  - 
IB  - 
19  - 

30  - 

31  - 
33  - 

23  - 

24  - 
35  - 
26  - 
27  - 
26  - 

29  • 

30  - 

31  - 

32  - 

33  - 

34  - 

35  - 

36  - 

37  - 

38  - 

39  - 

40  - 

41  - 

42  - 

43  - 

44  - 

45  - 

46  - 

47  - 

48  - 

49  - 

50  - 


/*  Boutliwa  to  ooaq^uta  32x32  20  coapln  FIT  ualng  (our  VBF  chlpa. 

Oparatlon  raqulraa  tuo  pliaaaa  at  operation,  ana  to  calculate  row 
FFTa,  tba  other  to  calculata  coluan  FFTB.  Ualim  aultlpla  VSP  chlpa 
raqulraa  aynchronlzatlon  batwaan  phaaaa  ao  that  data  can  be  exchanged. 
Thaaa  routlnaa  do  not  Include  the  eychronlzatlon.  The  routlnae  for 
each  phaea  oon  ba  oallad  froa  another  routine  «iblch  provldaa  It  between 
oalle,  or  tba  68020  oan  Invoke  the  flret  phaoe  aixl  wait  for  It  to 
flnleh  bafora  Invoking  tha  aacond. 

Baob  oblp  obould  ba  paeaad  polntare  to  row  or  ooluan  (CBIF  *  8)  with 
CHIP  equalling  0,  1,  2,  or  3,  depending  on  tba  chip.  It  will  haixlla 
tha  8  rowe  or  oolUBia  etartlng  at  that  point,  hddiim  *  paraaetar  to 
give  tha  nuaber  of  rowe  or  coluana  to  do  would  allow  tba  aaaa  routine 
to  ba  uoed  by  1  or  2  cblpe  without  naedlng  to  oaka  aultlpla  calle. 

nia  paraaetar  ln_0ata  polnta  to  tha  Input  vactor.  The  output  vector 
la  placed  at  0ut_0ata.  Tha  operation  con  ba  parforaad  In  place  If 
daalrad.  Both  Input  and  output  vactora  are  In  noraal  ordar. 

To  gat  on  Invairaa  FFT,  juet  change  tha  nubroutlna  and  chaiHie  tha 
FFT  Inatructlona  to  IFFT  Inatructlona. 

To  uaa  real  data,  olthor  eat  the  laagliury  parta  to  zero  to  gat  a  ccoplax 
vactor,  or  chaima  U)_c  to  U>_(R,0)  to  uaa  a  raal  vector.  With  a  real 
vactor,  tbla  oparatlon  coiuxit  bo  parforaad  In  place,  alncn  tba  output 
data  would  ovazwrlta  unraad  Input  data. 


aap335() 

( 

/» 

/*  FFT  for  rowa,  eight  32  point  FFTa  on  aaquentlal  data  */ 

auanaUTIMB  FFT32HQH(ar325raf  In_0ata,  zr325raf  Out_Data) 

{ 


/*  oat  up  two  BAM  oactlona,  owaiqiad  by  $LC  */ 

BBT  (  •ims,  -XOB  ]; 


/*  oat  polntora  to  Input  and  output,  coopannata  for  liwroaent  */ 
/*  notai  dopandlng  on  paraaetar  order  to  gat  In_Data  Into  $A  */ 
UR  Out_Data  ->  [$B,  $A]j 
aUBR  JB,  #64; 

/*  Inltlallxa  loop  count  •/ 

UR  #7  ->  JLC; 


Pogai 


1 


Datai  7/20/92 


rila:  B;FPr2C32.AaM 


51  - 

/•  start  up  P/T  with  first  RAM  bank  •/ 

52  - 

U>_Ct(32)  $A  ->  $C1; 

53  - 

FPT_Ci(32)  $C1; 

54  - 

55  - 

/•  loop  7  tlsMS,  XOR  with  $LC  altsmatas  RAM  •/ 

56  - 

LD_CJ(32)  $A>>64  ->  SCO; 

57  - 

PPT_Cs(32)  SCO; 

SB  - 

ST_C:(32)  SCI  ->  SB*-64:(1,32"); 

59  - 

Ij0QP:[IB,DL]  I ILZ] ,  13; 

60  - 

61  - 

/«  savs  last  RAM  bank  •/ 

62  - 

aT_C:(32)  SCI  ->  SB*-64!(1,32-); 

63  - 

64  - 

) 

65  - 

66  - 

/• 

FFT  for  ooluana,  sight  32  point  FFTs  on  intarlaavad  data  •/ 

67  - 

aUBROUTIHB  PFT3200L(zr32Sr«f  In_D«t«,  zr325raf  Out_Data) 

66  - 

< 

69  - 

/•  sat  up  two  RAM  aaotlona,  swappad  by  SLC  •/ 

70  - 

BET  (  •ims,  'XOR  ]; 

71  - 

72  - 

/•  sat  polntars  to  data,  ooapansata  for  first  Incraaant  •< 

73  - 

/*  notai  dapandlng  on  paraaatar  ordar  to  gat  Xn_Data  Into 

74  - 

XSR  Out.Data  •>  (SB,  SA]; 

75  - 

8UBR  SB,  f2; 

76  - 

77  - 

/•  Inltlallss  loop  count  */ 

76  - 

UR  17  •>  SLC  ; 

79  - 

80  - 

/•  start  up  FFT  with  first  RAM  bank  •/ 

61  - 

U_C!(32)  $A:(32,1)  ->  SCI; 

82  - 

FPT_CS(32)  SCI; 

83  - 

84  - 

/*  loop  7  tlaas,  XOR  with  $U  altamatas  RAM  */ 

65  - 

U_Ci(32)  SA**2i(33,1)  •>  SCO; 

66  - 

PFT_CI(32)  SCO; 

67  - 

n_Ci(32)  SCI  ->  9B*-2«(32,1J*; 

66  - 

UaPl(IX,DL]  (tui,  13; 

89  - 

90  - 

/*  sava  last  RAH  barJc  */ 

91  - 

8T_C«(32)  SCI  ->  $B»-2l(32,ir; 

92  - 

93  - 

94  - 

»/ 

95  -  > 

Pagai 
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Oat«:  7/20/92 


Fils:  BiFIiriSB.ASM 


1 

2 

3 

4 

5 

6 
7 

e 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 
26 

29 

30 


/*  Cods  to  notify  66020  of  task  conplstion.  This  cods  is  navsr  actually 
osllsd  from  mnywbmrm.  Xnstsad,  its  addrssa  is  ussd  as  ths  rsturo 
•ddrsss  in  ths  call  trass  that  ths  68020  s^ts  up  whsn  invoking  anothsr 
routins.  Whsn  ths  routins  ccaplstsa  and  rstums,  it  will  exscuts  this 
cods.  This  Bsthod  allow  all  routinss  to  bs  callsd  without  having 
thSB  tsrsinats  ths  task  until  final  coaplstion. 

*/ 

/*  status  bit  valus  to  indicate  finished  */ 
tdsfins  FIHISHED  2 

zap325() 

< 

/# 

SUBROUTIHB  FXHtSa( ) 

{ 

/*  get  valus  for  status  bits  */ 

LDR  IFIMI6HB0  «>  $X; 

/*  sake  sure  all  operations  are  coaplsts  */ 

ayiK;i  [A6,cu,Bu,MU] ; 

/*  wits  to  global  status  latch  */ 

BTR  $X  «>  0x40000; 

/*  halt  */ 

BLTi 

) 

•/ 

> 


Flla:  B:PBAI[20.Aa« 


Datai  7/20/92 


1  -  /*  Roatlna  to  find  paak  valuan  In  a  raal  aatrlx.  By  varying  paraaatara.  It 

2  -  can  produoa  a  vaator  of  tda  aax  valua  In  aaob  row  or  oolian  or  tba  aax 

3  -  valua  In  tha  antlra  aatrlx.  Tha  calculation  can  ba  dlvldad  batwaan 

4  -  Bultlpla  VSP  chlpa  by  giving  aacta  one  a  contiguous  aubaat  of  tba  problaa. 

5  -  Tba  aaiTlmiii  aaplltuda  of  a  coaplax  aatrlx  can  ba  found  by  flrat  conputlng 

6  -  tba  powar  (aagnltuda  aquarad)  and  finding  tba  aavlaua  of  that.  If  tba 

7  -  aagnltuda  Itself  la  required,  It  la  probably  still  faate.  find  tbs 

8  -  peaks  flrat  and  than  ccapute  tha  suignltuda  for  only  those  points  rather 

9  -  than  coaputlng  all  tba  aagnltudaa  and  finding  tba  peaks. 

10  - 

11  -  Tba  routine  baa  a  large  nuabsr  of  pnraastars  to  allow  It  to  ba  used  In  a 

12  -  flaxlbla  aannar.  Tha  paraaatar  Nuabar  gives  tba  nuabar  of  saparata 

13  -  vectors  (rows  or  coluana)  to  find  tha  aaxlaua  for.  Tha  paraaatar  Length 

14  -  glvas  tba  length  of  aach  vector  (row  or  ooluan).  Langtb  auat  ba  no  aora 

15  -  than  1024  for  this  routine,  though  a  alight  aodlflcatlon  would  allow  up 

16  -  to  64X.  Tba  Input  paraaatar  Spacing  glvaa  tbs  dlatanca  batwaan  starting 

17  -  alaaanta  of  conaacutlva  vactora.  The  Input  paraaatar  Intsrlaava  glvas 

18  -  the  distance  bettman  consaoutlvs  alaaanta  within  a  vaator.  Dua  to  soaa 

19  -  ccnatralnta  on  tba  3MBS_Msa  rsglatar,  bit  24  aunt  also  ba  sat  In  tba 

20  •  paraaatar.  Such  a  aachlna  word  can  only  ba  craatad  at  aasMbly  tlaa. 

21  -  It  can  ba  craatad  directly  by  using  a  paraaatar  ABOCvalus)  with  tha 

22  -  aacro  definition 

23  - 

24  -  Idaflne  ARa(X)  (0x1000000  |  (X)) 

25  - 

26  -  or  by  using  a  paraaatar  that  points  to  such  a  value  craatad  at  asaaably 

27  -  tlas.  As  a  slight  ccapansatlon,  a  valua  other  than  1  can  ba  placed  In 

28  -  tha  field  froa  bit  24  to  30.  Tbls  valua  will  ba  used  as  tba  SMBS  valua 

29  •  while  tha  rest  of  tba  Intarlaava  valua  Is  used  as  $HSS.  This  allows 

30  -  for  asob  vector  to  ba  oddrossod  oora  gonorally.  If  tha  $I<BS_HSS  raglstar 

31  -  already  contains  on  appropriate  valua.  It  can  ba  possad.  Tha  paraastsr 

32  -  ln_Data  points  to  tha  start  of  the  first  Input  vector.  Tha  output  will 

33  -  ba  plooad  at  Out_Data.  The  output  will  oonslst  of  a  vector  of  length 

34  -  Busbar  of  pairs  of  soxlaus  values  and  tbs  Index  batwaan  0  and  Langtb  of 

35  -  whora  that  valua  appaarad. 

36  - 

37  - 

38  -  To  find  the  Baxlsus  row  values  for  a  ROUXCOL  aatrlx  using  N  VBF  chips 

39  -  PBAX20(CaL/H,  ROW,  ROW,  *80(1),  6( ln«CHIP*ROW«OOL/H) ,  6(Out«CBIP*2*CaL/R) ) 

40  - 

41  -  To  find  tha  boxIbub  coluai  values  for  a  N3WX00L  aatrlx  using  R  VSP  chips 

42  -  PBAK2I>(RaW/K,  COL,  1,  ARa(nOH),  C(  IntCBIP'KW/B) ,  6(Outt(aiP*2*CaL/N) ) 

43  - 

44  -  ossuBlng  that  ROW  and  OOL  are  ovanly  divisible  by  R.  Using  aora  than 

45  -  ona  chip  on  aach  local  bus  will  probably  not  Inprova  parforaanca  bacauss 

46  -  the  operation  Is  bus-bandwidth  bound.  Whan  using  two  chips,  CHIP  shculd 

47  -  be  sat  to  0  or  1  In  tba  above  forauloa. 

48  - 

49  -  To  find  tba  overall  aaxlaua,  treat  as  ana  long  rw  using  1  VSP  chip 

50  -  PB*IC2D(1,  R0W*00L,  any_valua,  ARa(I),  6In,  (Out) 


Pagat 


Data:  7/20/92 


rila:  B:PB»K20.AaH 


Xo  find  valuaa,  just  cbanga  tba  IttX  Instruotlon  to  HIM. 

■Ota:  It  la  tachnlcally  poaalbla  to  accoapllata  tba  aattlng  of  tba  uppar 
bita  of  $MBS_MSS  at  axaoutlon  tlaa  with  aufflolant  liqaDulty.  It  xaqulraa 
ualng  (alow)  floating  point  oparatlona  to  aaiilpalata  tba  talgbar  blta.  A 
lookup  table  la  another  poaalblllty. 


62  -  ZBp325() 

63  -  ( 

64  - 

65  - 

66  - 
67  - 
66  - 

69  - 

70  - 

71  - 

72  - 

73  - 

74  - 

75  - 

76  - 

77  - 

78  - 

79  - 
60  - 
81  - 
82  - 

83  - 

84  - 

85  • 

86  - 
87  - 
66  - 

89  - 

90  - 

91  - 

92  -  > 


i 


) 


Page:  2 


i 


/I 

SUBROUTIHE  PBA1!2D(  zr3251nt  Muabar, 

ar3251nt  Length, 
ar3251nt  Spacing, 
ac329val  Intarlaava, 

Bc325raf  ln_Data, 
xc325raf  Out_Data) 

( 

/■  aat  up  autoButlc  aava  to  $SAR  */ 

SET  [  aSAE  ]; 

/*  sat  up  paraastars  In  corract  raglatars 

note:  LDRs  dapand  on  paraaatar  ordar  to  put  ln_0ata  Into 
$A,  Intarlaava  Into  $MBS_M8S  and  Suabar  Into  $LC. 

•/ 

LOB  Out_Data  »  ($SAB,  $A,  $I«B8_MS8); 

LDR  Length  •>  [$PR,  $LC]; 

/*  loop  Ruabar  tlaas,  handling  LeiRjtb  aaob  tlaa,  addraaalng  properly  ■/ 

MAX_R:($RKPT,$REPSAX)  $A: ( $MSS,$HBS)  ->  SMRKX; 

ADOR  $A,  Spacing) 

L0ap)(IB,DL]  IILZ],  12; 

) 

#/ 


51  - 

52  - 

53  - 

54  - 

55  - 

56  - 

57  - 
56  - 

59  - 

60  -  */ 
61  - 


Datai  ino/92 


1 

2  - 

3  - 

4  - 

5  - 

6  - 

7  - 

8  - 
9  - 

10  - 
11  - 
12  - 

13  - 

14  - 

15  - 

le  - 

17  - 

18  - 

19  - 

20  - 
21  - 
22  - 

23  - 

24  - 

25  - 
28  • 

27  - 

28  - 

29  - 

30  - 

31  - 

32  - 


-  /*  Routlaa  to  parfon  polar  to  rectangular  oonvaralon  on  a  coaplax  vsctor. 

-  Uaaa  saparata  alna  and  coalna  tablaa.  Could  uaa  ona  tabla  for  both, 

-  but  that  would  raqulra  axtra  tlaa.  Only  oparataa  on  anglas  In  tba  first 

-  quadrant  alnce  those  are  tha  only  anas  produced  by  tba  rectangular  to 

-  polar  oonvaralon.  Tha  tabla  else  will  dataxalna  tbs  aoouraoy  of  tha 
oonvaralon.  Tba  error  will  ba  laaa  than  100%  *  pi  /  (4  *  tabla  alts). 

-  Tha  vector  length  la  paaaad  In  tha  pirtaatar  Length.  The  paranstar 

~  In_Data  points  to  tba  start  of  tha  vector  to  ba  convartad.  Tha  result 

-  la  placed  at  Out_Data.  This  algorltha  can  ba  parforwad  In  place  If 

-  desired. 

-  Tbla  routine  uses  software  pipelining  to  eaxlaita  throughput.  Thin 

-  should  causa  tha  bus  to  ba  busy  aost  of  tha  tins.  If  two  chips  ara 
parfcralng  this  at  tha  sane  tlaa,  there  will  not  ba  enough  bandwidth. 
Banchaarklng  will  need  to  ba  used  to  dataralna  whathar  this  Is  faster 
than  a  version  which  does  not  attaapt  pipelining  but  uaaa  largar  blocks. 

■otai  changing  3KPC  Instructions  to  uaa  IB  qualifier  causes  Incorrect 
rasulta.  It  works  correctly  for  other  routines.  Bot  using  IB  slows 
this  routlna  d<Mi.  Moving  tha  software  pipeline  loop  kamal  so  that 
tha  JMPC  doesn't  block  a  subsaquant  Instruction  that  could  ba  SKacutad 
concurrently  with  tha  previous  ona  would  regain  aost  of  tha  spaed. 

•/ 

/*  need  trig  functions  for  tables  */ 
llncluda  <aBth.h> 

alxa  of  slna  and  casino  tablaa  */ 
fdaflna  XhB  BIZB  128 


33  -  /•  alxa  of  IncrsMnt  batwaan  tabla  antrloa  •/ 

34  -  (daflna  ''NCRBMBR  (asln(  1.0)/(XhB_8IZB-l) ) 

35  - 

36  -  /•  asaaably  generation  function  •/ 

37  -  sap32S() 

38  -  { 

39  -  Int  Indax; 

40  - 

41  -  /*  utnarata  trig  function  tables.  */ 

42  -  /t 

43  -  SlnSabii 

44  -  #/ 

45  -  for  (IndSK  •  0;  Index  <  XhB_SlZB;  lndax>«) 

46  -  < 

47  -  /# 

**  ~  .DhXh  (  (IBBB_Float(sln(lndax*l>CBB(BI«T)))  ); 

49  -  #/ 


Data!  7/20/93 
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51 

92 

53 

54 

55 
55 
57 
55 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 
75 

79 

80 
51 
82 
53 
84 
55 
86 
57 
88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 


/f 

CosTabil 

#/ 

(or  (iDdaK  •  0;  Indax  <  TKB_8IZB;  Indax-^*) 

< 

/» 

.DKC8  {  (IBES_Ploat(aoa(ln<lax*I]tCR9IBIIT)))  ); 

#/ 

> 

/I 

SUBRQUTIHE  FOL2I(BCT(sr3251nt  Langtb,  zr32Sraf  ln_Data,  zr325i«f  Out_Data) 


/•  uaa  taotb  RAM  banka  to  optlaiu  throoghpot  •/ 

/■  Rota:  cboaao  Intarlaavlng  pattam  aaauaaa  UIT  Inatruotlon 

aakaa  no  uaa  of  BU  alnoa  It  la  a  data  aovaaant  Inatructlcn. 

Alao  aaauaaa  that  arlthaatlc  oparatlooa  that  uaa  axtarnal 
oparanda  can't  ba  ovarlappad  with  aova  Inatructlona,  though 
thla  lan't  claaz. 

Banchaark  Bight  ba  naadad  to  cback  tba  Intarlaavlng  pattam. 

•/ 

/*  aat  up  two  RAM  aactlona,  awappad  by  $LC.  round  to  naaraat  •/ 

BBT  (  •ims,  -XDR,  -RDURD  ]  ; 

/*  load  pointara  to  data,  ablfting  $A  to  angla,  coapanaata  pra-lnc  */ 
ISBTR  In.Data  •»  $A; 

UR  0ut_0ata  •>  $B; 

BURR  $B,  f64i 

/*  inltlallia  loop  count  to  nuabar  of  32a,  aklp  loop  If  nona  */ 

saRaiTRi (8HIPT*5]  Langth  ■>  fLC; 

JMPC  (ER],  Do_RaBt| 

/■  atart  up  convaralon  with  (Irat  RAN  bank  */ 

/*  load  angla  Into  laaglnary  part  */ 

U_I:{32)  $A:(2,1)  ->  SIO; 

Mltlply  by  factor  to  gat  tabla  offaat  •/ 

»aiLT_(R,R):(32)  SCO,  l(IEBB_Plaat(1.0/IRa«BKBIITH  ->  $10; 

/*  oonvart  to  Intagar  to  gat  Intagar  part  right  juatltlad  */ 
FP1IIT_R:(32)  $10  ->  SIO; 

/*  If  no  Bora  to  do,  aklp  raat  of  loop  */ 

JMPC:(DL]  [U],  0o_Stora; 

/*  loop  with  aoftwara  pipelining  */ 

Loop; I 

/*  load  and  atart  naxt  vactor  */ 

U_l:(32)  $At-64i(2,l)  ->  SIO; 


Dat«i  7/20/92 


Fll*:  B:PIPB  P2R.AaM 


101  -  HUI«_(It,il)l(32)  $C0,  l(IBBB_Flo«t(1.0/IHCBB(BaT))  ->  SI0| 

102  -  /*  do  bud  oporatlon  for  pravloua  daring  aiiooutloB  of  ourroot  */ 

103  -  LUT_Ri(32)  Cootab,  $11  ->  SRI; 

104  •  /'do  naact  oporatlon  on  currant  vactor  */ 

105  -  PPIRT_R:(32)  $10  ->  $10; 

106  -  /*  flnlab  and  atora  pravloua  vactor  */ 

107  -  UII_Ri(32)  SlitTab,  $11  ->  $11; 

108  -  /■  aaauaa  oxtamal  oparaad  fatch  aonopollzaa  bua  unit  */ 

109  -  l«ILT_(R,R):(32)  $01,  $A-6S:(2,1)  ->  $01; 

110  •  8T_0i(32)  $01  •>  $B*-64; 

111  -  /*  dacraaant  count  (awltcbaa  banka)  and  loop  laaadlataly  If  not  dona  */ 

112  -  JMPCiIDL]  IILZ],  Loop; 

113  - 

114  -  Do_8tora:: 

115  -  /*  flnlab  up  laat  RM  bank  •/ 

116  -  /*  look  Up  coalna  of  angla  la  tabla  */ 

117  -  Ura_Ri(32)  OoaTab,  $11  •>  $R1; 

118  -  /■  look  Up  alna  of  angla  In  tabla  */ 

119  -  LUT_Ri(12)  Blnsab,  $11  ->  $11; 

120  -  /*  aultlply  coalna  and  alna  by  aagnltuda  to  gat  mal  and  laaglnary  */ 

131  -  HUL9_(R,R)t(32)  $01,  $A-li(2,l)  •>  $01; 

132  -  /*  atora  raaultlng  coaplax  nuabar  In  raotangular  coordlnataa  •/ 

123  -  8X^01(32)  $01  •>  $B*>64; 

124  - 

125  -  Do_Raatii 

/*  bandla  any  raaalndar  latt  aftar  blocka  of  32  ■/ 

/*  ablft  raaalndar  Into  $iwft,  uaa  (TO]  to  caro  blgh  bit  (32a)  •/ 

SOLSETR:  [SBIFT<>18,TC]  Langtb  •>  $PB; 

JMPO  (ZR),  End; 

/*  flnlab  raaatodac  •/ 

/*  load  angla  into  Imaginary  part  ■/ 

U>_II($RNPT)  $A«-64l(2,l)  •>  $10; 

/*  multiply  by  factor  to  got  table  offaat  •/ 

*«niT_(R,R)l($imPT)  $00,  l(»BB_Ploat(1.0/IBCRB)«RNT))  ->  $10; 

/■  oonvart  to  Intagar  to  got  Intagor  part  rlgbt  }uatlflad  */ 
PPIRT_Ri($IMPX)  $10  •>  $10; 

/*  look  up  coalna  of  angla  In  table  */ 

LUT_R:($BMPT)  Oodtab,  $10  ->  $R0; 

/*  look  up  alna  of  angla  In  tabla  */ 

LUT_Ri($IMPr)  Blnrab,  $10  •>  $10; 

/*  aultlply  coalna  and  alna  by  aagnltuda  to  gat  raal  and  laaglnary  ■/ 
••»LT_(B,R):($RMPT)  $00,  $A-li(2,l)  ->  $00; 

/*  atora  raaultlng  oo^ilax  nua<>or  In  lactangular  coordlnataa  •/ 
ar.CKSBMPT)  $00  ->  $Bf64; 


ISO  -  ) 


126  - 

127  - 

128  - 

129  - 

130  - 

131  - 

132  - 

133  - 

134  - 

135  - 

136  - 

137  - 

138  - 

139  - 

140  - 

141  - 

142  - 

143  - 

144  - 

145  - 

146  - 

147  - 

148  -  Bndii 

149  - 


3 


Data:  7/20/92 


Flla:  B:Pira_P2R.MN 


! 

I 


151  - 

152  -  > 


I 


i 


#/ 
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Fllat  BlPaL2IBCT.MM 


1  -  /*  Rontlu  to  pocfotM  polar  to  tactangular  convarslon  on  a  coaplax  vactor. 

2  -  Uaaa  aaparata  aloa  and  ooalna  tablaa.  Could  uaa  ona  tabla  for  both,  but 

3  •  that  would  raqulra  axtra  tiaa.  Only  oparataa  on  anglaa  In  tba  flrat 

4  -  quadrant  alnoa  tboaa  ara  tba  only  ooaa  producad  by  tba  ractangular  to 

5  -  polar  convarslon.  Otbar  anqlaa  will  produca  iinaxpactad  raaults.  nia 

6  -  tabla  alia  will  dataralna  tba  accuracy  at  tba  ccmyarBlao.  Tba  arror 

7  -  will  ba  laas  than  100%  *  pi  /  (4  *  tabla  slza). 

S  - 

9  -  langtb  of  tba  vactor  to  ba  convartad  la  paaaad  In  Langtb.  lo_Data 

10  -  points  to  tba  start  of  tba  Input  vactor.  Output  la  placad  at  location 

11  -  Oot_Data.  Ccnvaralon  can  ba  parfozaad  In  plaoa  If  daalrad. 

12  - 

13  -  Tbla  varalon  aaauaas  parfomanoa  la  boundad  by  local  bus  bandwidth  and 

14  -  tbarafors  doaan't  attaapt  aoftwara  plpallnlng  altamatlng  BhM  banka. 

15  -  Instaad  It  uaaa  tba  antlra  MM  at  onoa  to  alnlalsa  bus  traffic  for 

IS  *  Inatructlon  fatchlng.  Tbla  alao  aakaa  tba  coda  nors  raadabla.  Tasting 

17  -  trill  ba  naarlad  to  aaa  which  asthod  Is  faatar.  Using  half  of  MM  and 

18  -  loading  aagnltuda  In  otbar  half  bafora  MULT  night  aava  aora  bandwidth. 

19  -  •/ 

20  - 

21  -  /*  naad  trig  functions  for  tablaa  •/ 

22  -  llnoluda  <Bath.h> 

23  - 

24  -  /*  slza  of  alna  and  coolna  tablaa  */ 

25  -  fdaflna  TkB.BIEB  128 

26  - 

27  -  /*  slza  of  Incraawnt  batwaan  tabla  sntrlas  */ 

28  -  Maflna  nKatBOMT  (aaln(..0)/(*M_BlZS-l)) 

29  - 

30  -  /•  asaaably  ganaratlon  function  •/ 

31  -  Bspl29() 

32  -  i 

33  -  lot  Indaxj 

34  - 

2S  -  /■  Oanarata  tilg  function  tablaa.  •/ 

36  -  /I 

37  -  Bliffab: : 


38  - 

1/ 

39  - 

for 

(Indaa  -  0; 

lr..iax  <  TKB_8IfS;  Indaxtt) 

40  - 

( 

41  - 

/# 

42  - 

.DI/VA  {  (IBkB_Float(aln(lndax«IIIC8BIBIT))) 

43  - 

»/ 

44  - 

) 

45  - 

/I 

46  - 

OosTabit 

47  - 

•/ 

48  - 

for 

(Indaa  ••  0; 

IndSK  <  TM_8IZB;  Indaxtt) 

49  - 

i 

SO  - 

/• 

ragai 


1 


D«t«i  7/20/92 


riUi  BlKLTBKCT.AflN 


SI  - 

.lUZA  {  (IBB_Flo«t(ao*(lad*x*t>CIIBII>T)))  ); 

S2  - 

«/ 

53  - 

> 

54  - 

55  - 

56  - 

/# 

57  - 

SUBROI^KB  POL2RBCT(sr325int  L«ngth«  sr32Sr«£  InjirntM,  sr32Sr*f 

58  - 

t 

59  - 

60  - 

/• 

mmt  up  ono  RAM  aoction,  Mt  rauzMlog  to  noarMt  */ 

6X  - 

8BT  ( 

-RMfi,  -ROUHD  ]; 

62  - 

63  - 

/• 

load  polntori  to  dato^  ooaponaato  for  pro-lncr«Mnt  */ 

64  - 

/• 

Inrr— nt  $A  at  load  ao  it  pointa  to  anglo  P«rt  */ 

65  * 

laBTR 

In_Data  ■>  $A; 

66  - 

UA  Out_D*t«  »  $B; 

67  - 

8UBR 

[$A«  SB],  1128; 

68  - 

69  - 

/• 

Initialisa  loop  count  to  nuabar  of  648#  aklp  loop  If  nooa 

•/ 

70  - 

amacnii  (aBiFT«6]  Lmgth  •>  $lc; 

71  - 

OKPC 

[ZR]«  Oo.Raat; 

72  - 

73  - 

liOOpil 

74  - 

/• 

load  angla  into  ljuginacY  part  */ 

75  - 

LD.l: 

(64)  $A>«128:(2#1)  •>  51; 

76  - 

/• 

■ultiply  by  factor  to  gat  tabla  offaat  */ 

77  - 

MILT. 

.(R#R)l(64)  5C#  |(IEBB.Float(1.0/IRCKEMSirr)}  ->  $i; 

78  - 

/* 

ooovart  to  intagar  to  gat  intagar  part  right  Juatifiad  */ 

79  - 

PPIIfT_B:(64)  $I  •>  91; 

80  - 

/• 

look  up  coalna  of  angla  in  tabla  */ 

81  - 

L(IT_R:(64)  CcMT«b,  $I  ->  $R; 

82  - 

/• 

look  up  aina  of  angla  in  tabla  */ 

83  - 

LUT_R:(64}  81nT«b«  51  *>  SI; 

84  - 

/* 

Multiply  ooaina  and  aina  by  aagnituda  to  gat  raal  and  iaaginary 

85  - 

MILT. 

.(R#R)i(64)  5C#  $A-li(2#l}  •>  $C/ 

86  - 

/• 

atora  raaulting  oo^lax  nuabar  in  ractangular  ooordinataa 

•/ 

87  - 

0T_CI(64)  K  •>  IB^-128; 

88  - 

/• 

dacranaat  5LC#  loop  iMaadiataly  oo  not  raro  */ 

89  - 

JMPCl(OL,IB]  (ILZ],  Loop; 

90  - 

91  • 

po_Ilasti  I 

92  - 

/• 

bandla  caaalndar  laft  aftar  blocka  of  64  */ 

93  - 

94  - 

/• 

ahift  raaaindar  into  %mn,  aklp  if  nona  */ 

95  - 

SHLSETR:  [BBIFT-IB]  Loogth  •>  SPR; 

96  - 

JMPC 

(m,  Ind; 

97  - 

98  - 

/• 

flnlsb  raalndar  •/ 

99  - 

/* 

load  angla  Into  Iaaginary  part  •/ 

100  - 

U)_Ii($IMFT)  »A4*128:(2,1)  •>  $1; 

9ag«i 


2 


Data:  7/20/92 


Flla:  B:FOL2ISCT.JUM 


101  - 
102  - 
101  - 

104  - 

105  - 

106  - 

107  - 

108  - 

109  - 

110  - 
111  - 
112  - 

113  - 

114  -  Bndtt 

115  - 

116  -  ) 

117  -  #/ 

118  -  > 


/*  aultlply  by  factor  to  gat  tabla  offsat  */ 

•8J1.T_(R,R):($IIMK)  $C,  KIEBB.Floatd.O/noiBinT))  ->  $1; 

/*  oonvart  to  Intagar  to  gat  Intagar  part  right  Juatlflad  •/ 
FPIIIT_I«:($l8«n)  $I  •>  $1; 

/■  look  up  ooalna  of  angla  Is  tabla  •/ 

U)T_Rl(}IBCFT)  CoaTab,  $I  »  $Rj 
/*  look  up  alna  of  angla  In  tabla  •/ 

LUT_R|($IMFT)  siaiab,  $I  ->  SI; 

f*  ailtlply  coaloa  and  alna  by  aagnltuda  to  gat  raal  and  laaglnary  */ 
••n«_(ll.R)i(Sl«PT)  SC,  SA-li(2,t)  ->  SC; 

/*  atora  raaultlng  coaplax  nuabar  In  ractangular  coordlnataa  ■/ 
BT_Ci($IMPT)  SC  •>  SB«~128; 


ragai  3 


u 


Oatai  7/20/92 


Fllai  Bi90HBil.AaH 


1  -  /• 

2  - 

3  - 

4  - 

5  - 

6  - 
7  - 
B  - 
9  - 

10  - 
11  - 
12  - 

13  - 

14  - 

15  - 

16  - 

17  - 

18  - 

19  - 

20  - 
21  - 
22  - 

23  -  •/ 

24  - 

25  -  Mp325() 

26  -  { 

27  -  /I 

2*  •  SUBBOimini  FOHnt(sr3251nt  Langtb,  zr32Sraf  In_Mta,  xr325raf  OutJ>ata) 

29  -  ( 

/•  naa  both  MH  banka  to  laprova  throughput  •/ 

/*  aat  up  t%n  MM  aaotleoa,  aaappaO  by  fLC  */ 
an  I  'iiMB,  •xoR  ii 

/*  aat  up  polntara  to  data  araaa,  ooapanaata  $B  tor  pra-lncraaant  */ 

/*  Botai  lAad  dapanda  on  paraaatar  ordar  to  gat  In.Data  Into  $h  •/ 

U>R  Out_I>ata  •>  [$B,  $A1  t 
SUBR  $B,  132; 

/*  InltlallM  loop  count  to  nuaber  ol  32b,  aklp  loop  If  nona  */ 

8aBSRrR:(8BIR-S]  longth  •>  5I.C; 

JMK  [SR],  Do.Baat; 

/*  Btart  up  with  tlrat  RRN  bank  •/ 

LD_C!(33)  $R  ■>  ICO; 

MaaQ_Rt(32)  3C0  ->  $R0; 

/*  If  no  aora  to  do,  aklp  raat  of  loop  •/ 

<n<PCi[DL)  (L2],  Do_8tora; 


30  - 

31  - 

32  - 

33  - 

34  - 

35  - 

36  - 

37  - 
36  - 

39  - 

40  ~ 

41  • 

42  - 

43  - 

44  - 

45  - 

46  - 

47  - 

48  - 

49  - 

50  - 


Routlna  to  ooaputa  aagnltuda  aquarad  for  a  coaplax  vactor.  If  tba  vactor 
ia  tba  PIT  of  a  algnal,  thla  la  tba  poumr  apaotrua  of  tba  algnal,  Tbla 
routlna  la  faatar  than  tba  ractangular  to  polar  oonvaralon  and  abould  ba 
uaad  If  tba  aagnltuda  aquarad  la  aa  uaaful  aa  tba  aagnltuda.  For  axaapla, 
tba  point  of  aaxlaiaa  aagnltuda  la  alao  tba  point  of  powar. 

Ibla  routlna  can  ba  parforaad  In  plaoa,  producing  an  output  vactor  half 
tba  langtb  of  tba  input,  nila  would  laava  gapa  If  aultlpla  VSF  chlpa 
wara  balng  uaad.  If  tba  oalculatloo  la  not  parforaad  la  placa,  or  gapa 
ara  acoaptabla,  tbara  la  no  problaa  ualng  aultlpla  cblpa  to  calculata 
parta  of  tba  output  vaetora. 

■otai  tbla  routlna  la  I/O  bound  avan  on  a  alngla  VSP.  Mltb  two  abarlng 
a  bua.  It  will  ba  avan  woraa.  If  It  la  balng  naad  iaaadlataly  aftar  an 
FFT  oparatlon.  It  would  ba  aora  afficlaot  to  parfora  tba  aagnltuda 
aquarad  oparatlon  aa  tba  laat  atap  of  an  FFT  routlna  bafora  atorlng  tba 
raault.  Tbla  would  aava  a  atora  and  raload. 

Tba  Input  paraaatar  Langtb  eontalna  tba  nuabar  of  alaaanta  In  tba 
Input  vactor.  Tba  paraaatar  In_Data  polnta  to  tba  atart  of  tba 
Input  vactor.  Tba  output  ndll  ba  placad  at  Out_oata. 
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51  - 

52  -  /■  loop  wltb  aoftwan  plpallno<  XCR  with  5LC  altarnatao  iUM  */ 

53  -  U>_ct(32)  SA«>64  ->  SCO; 

54  -  MaaQ_Ri(32)  SCO  »  SRO; 

55  -  aT_R:(32)  $R1  ->  SR*-32; 

56  -  UX»t(OL]  {ILZ],  t3; 

57  - 

58  -  DojBtoxoi: 

59  -  /*  oavo  laat  RM  bank  */ 

60  -  SC_Rl(32)  SRI  •>  $Bf32; 

61  - 

62  -  Do_Raat:: 

63  -  /*  bandla  raaalndar  laft  aftar  blocka  of  32  */ 

64  - 

65  -  /•  ablft  raaalndar  Into  $RMPI,  uaa  [TC]  to  xaro  high  bit  (32a)  •/ 

66  -  8BLanRi[aHIR-18,TCl  Langth  ->  S8R; 

67  -  JMPC  (ZR),  Ind; 

88  - 

69  -  /*  flnlah  up  raaalndar  */ 

70  -  U>_C|(SIMRT)  SA*-64  •>  SCO; 

71  -  MaaQ_Ri(SHMRT)  SCO  ->  $R0; 

72  -  8T_Ri(SIIMPT)  SDO  •>  SB+>32; 

73  - 

74  -  Indt; 

75  - 

76  -  ) 

77  -  #/ 

78  - 

79  -  ) 


Datai  7/20/93 


Plla:  BiRQOaV.ASM 


1  -  /*  VPH  coda  foe  convolution  of  a  raal  aaquanca  of  up  to  64  points  witb 

2  -  anotiiar  longar  raal  saquanca,  producing  up  to  1024  outputs.  Tbla 

3  -  alia  can  ba  dona  with  a  slngla  PIR  Instruction,  nils  coda  can  ba 

4  •  callad  rapaatadly  on  a  slngla  procaaaor  to  handla  convolutions  wbara 

5  •  >010  than  1024  output  points  ara  raqulrad  aa  long  aa  tba  shortar 

6  -  saquanca  la  atlll  lass  tban  64  points.  Bowavar,  a  dlffarant  routloa 

7  -  daslgnad  for  a  longar  convolution  would  ba  aora  afflclaot.  Tbla 

8  -  aaaa  coda  can  ba  usad  on  aultlpla  VSP  chips  slaultanaoualy  to  glva 

9  -  a  conaldarabla  apaad  Incraasa.  Ttaars  aay  ba  no  bansflt  to  axaeutlng 

10  -  on  aora  tban  ons  VSP  cblp  par  bus  bacausa  tba  FIR  Instruction  nay  not 

11  -  glva  up  tbs  bus  batman  output  points. 

12  - 

13  -  TO  gat  a  full  convolution  of  tba  input  raqulraa  padding  both  anda  of 

14  -  tba  longar  Input  saquanca  wltb  a  nuabar  of  taroas  aqual  to  tba  .:angth 

15  -  of  tba  sbortar  saquanca  alnua  ooa.  This  la  raqulrad  In  ordar  to 

16  -  aoipllcltly  provlda  tba  zaroas  that  ara  assuaad  to  bo  aultipllad  by 

17  -  aliaanta  of  tba  sbortar  saquanca  that  aztand  bayond  tba  ands  of  tba 

18  -  longar  ona  during  tba  convolution  procass.  Tba  langtb  of  tba  output 

19  -  saquanca  should  ba  aqual  to  tba  aua  of  tba  langtbs  of  tba  (unpaddod) 

20  -  Input  aaquanoas  alnua  ona.  If  a  circular  convolution  is  daalrad 

21  -  Inataad  of  a  llnaar  ona,  tba  zaro  padding  should  ba  raplacad  with 

22  -  points  froa  tba  otbar  and  of  tba  Input  saquanca. 

23  - 

24  -  Iha  sbortar  Input  langtb  la  paaaad  In  Coaf_I,angtb.  Iba  output  langtb 

25  -  (aqual  to  Input  langtb  bafora  padding  plus  coafficlont  langtb  alnus  ona) 

36  -  la  pas sad  as  Out_l.aogtb.  Cosfflelonta  points  to  tbs  sbortar  saquanca 

27  -  (typically  PIR  flltar  cost f Iolanta).  in.Data  points  to  tba  start 

28  -  of  tba  longar  saquanca  (possibly  a  zaro  pad).  Tba  output  la  placad 

29  •  at  Out_Data.  Typical  call  for  a  four  tap  flltar: 

30  *  CALL  R00MV(4,  1024,  SCosf,  4In,  40ut) 

31  • 

32  •  Tba  convolution  can  ba  parforasd  In  placa  wltb  carsful  cbolcas  of 

33  -  paramTar  valuas.  If  tbs  convolution  raqulraa  aultlpla  calls  on  s 

34  -  slngla  VSP  cblp,  tba  output  Bust  bagln  at  tbs  first  location  of  the 

35  -  long  Input.  This  avoids  ovarwrltlng  Inputs  that  will  ba  naadad  for 

36  -  tba  naxt  call.  Bowavar,  If  aultlpla  chips  ara  balng  usad,  tba  output 

37  -  aunt  ovarwrlts  tba  last  input  usad  In  Its  co^iutatlon.  This  works 

38  -  bacausa  tba  VSP  chip  has  alraady  raad  tba  Input  Into  Intarnsl  RAM 

39  -  for  furtbar  uas.  It  la  nsosasary  bacausa  that  Input  Is  tbs  first 

40  -  ona  which  will  not  ba  naadad  by  tba  chip  working  on  tba  prsvlous 

41  -  portion  of  tba  convolution.  Seas  furtbar  oars  la  naadad  In  tba 

42  -  Initial  startup  of  in-plaoa  aultlpla  cblp  convolution  to  ansurs  that 

43  -  a  ohlp  doaa  not  writs  ovar  any  Input  valuas  bafora  tba  aubsaquant 

44  -  cblp  raads  tbaa  In.  K  aultlpla  call,  aultlpla  cblp  convolution 

45  -  cannot  ba  dona  In  placa  bacausa  tba  constraints  ara  oontradlotory. 

46  •  Bowavar,  aueh  a  laiga  data  aat  would  not  fit  Into  sbarad  amory. 

47  - 

48  -  Splitting  up  a  ooovolutlon  botwaan  llUN_CHIPS  chips  would  raqulra 

49  -  aoBBthlng  Ilka  tbs  following  Invocation  for  chip  ranging  froa  zaro 

50  -  to  (SUM.ailPS  -  1): 
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Datsi  inti/n 


PUa:  BiROONV.AaH 


51  - 

52  - 

53  - 

54  • 

55  - 

56  - 

57  - 

58  - 
58  - 
60  - 
61  - 
62  - 

63  - 

64  - 

65  - 

66  -  •/ 
67  - 


68  -  aap325() 

69  -  1 

70  - 

/# 

71  - 

suaxounaa  xooiiv{ 

xr3251nt  Coaf_Langth, 

72  - 

zr3251nt  Out_Langtb, 

73  - 

(r325raf  Coafflclanta, 

74  - 

zr325raf  ln_0ata. 

75  - 

xr325raf  Out_Data) 

76  - 

( 

/*  aat  up  atida  proparly,  ooa  MM  iHuik,  24  bit  Intagars  */ 

8BT  (  •ms,  •ixoR  ,  ■imr  ]; 

f*  aat  $8M  to  put  output  In  oorraot  plaoa  */ 

UX  Ckit_Data  •>  $UR| 

/■  to  gat  raal  ooaCflolanta  In  aig-aag  ordax,  naad  to  load  half 
aa  aany  (roundad  up)  ’ooaplaai*  oaaftlolanta 

•/ 

SBLSRRi  [BBIR-17]  Coar_Langtb  ->  JPX; 

ADDX  Snt,  10x020000; 

/*  load  coafficlanta  In  ravaraa  zlg-aag  raal  ordar  ■/ 

LOX  Coafflclanta  ■>  5X; 

XOOR  5A,  Coof_Laogtb; 

BUBR  $X,  12; 

U>_(I.1>)<($WVT)  SA:(-1.1)  ■>  SCO; 

/*  now  aat  up  actual  langtba  for  m  inatructlon  •/ 

BBLailRi (saiFT-ia]  Coaf^Imogtb  •>  SPR; 

ADDX  $PX,  Out_Laagtl>; 

/*  oonvolva  wltb  Input  aaquanoa  */ 

PIR_Rt(SIMPX,  SXXPIAS)  8Z0,  *ln_Oata; 


77  - 

78  - 

79  - 

80  - 
81  • 
82  - 

83  - 

84  - 

85  - 

86  - 

87  - 

88  - 

89  - 

90  - 

91  - 

92  - 

93  - 

94  - 

95  - 

96  - 

97  - 

98  - 

99  - 
100  - 


CALL  X0CXIV(0aBF_LBR,  OUT_SIZX(obip),  ACaaf, 

S(In  >  l»XA_CVPSBT(chlp)),  C(Out  *  DAXA_0rFaEr(cUp) ) ); 

wltb  tba  daflnltlona 

fdaflna  OUT_LBH  (IH_Lra  *  OOBP.LXR  -  1) 

tdaflna  0ATA_OFXBBT(CBIP)  (((CBI9)  *  OUT.LXX)  /  aUK_CBIFS) 

Idaflna  aUT_aiZ8(CBI9)  (DAXA_0PFSBI(CHIP«1)  -  DAZA_OFP8R(CBIP)) 

Xota  that  alnoa  all  ttala  routlna  doaa  la  to  load  varloua  valuaa  Into 
Intaxnal  raglatara  and  XAN  and  than  axaouta  a  alngla  Inatruotlon,  It 
■Igbt  ba  faatar  for  tba  68020  to  load  tba  valuaa  dlractly  and  axacuta 
tba  PIX  Inatructlon  In  alava  aoda.  Tba  aaaa  appllaa  to  tba  coaplax 
convolution  and  tba  corralatlona. 
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FUa:  BiRCORR.JtSM 


Prograa  to  perform  real  correlation  between  two  real  vectors  with  up  to 
64  eleasots  In  tbe  shorter  one  and  up  to  1024  elaiMDta  In  the  output 
using  a  single  xoran  processor.  Due  to  raqulrsnents  of  the  Instniction 
used,  the  longer  real  vector  must  be  padded  at  both  ends  with  (shorter 
length  1)  real  zero  elenenta.  These  are  needed  for  when  the  shorter 
vector  extends  beyond  the  end  of  the  longer  during  the  operation.  If  the 
vectors  are  the  same  length,  either  may  be  considered  the  longer  one. 

^e  length  of  the  short  vector  Is  passed  In  the  parameter  Coef_Length. 

The  length  of  the  desired  output  vector  (typically  equal  to  the  site  of 
the  lengths  of  the  Input  vectors,  minus  one)  is  passed  In  Out^Length. 
Coefflolenta  points  to  the  short  Input  vector.  InjData  points  to 
the  first  zero  pad  In  the  longer  input  vector.  The  output  Is  placed  at 
Out_Data.  The  output  data  could  be  stored  In  the  place  of  the  first 
Input  vector  If  desired.  Typical  call  to  perform  a  full  autocorrelation 
in  place  with  a  64  (padded  to  190)  element  vector: 

CALL  CCORR(54,  127,  &ln,  £(ln+63),  £in) 

The  (ln>63)  skips  the  padding  at  the  front  of  the  vector. 

Motet  If  this  routine  will  always  be  used  for  two  equal  length  vectors, 
only  one  length  parameter  la  needed.  The  other  can  be  computed  from  It 
with  some  extra  overhead.  On  the  other  hand.  If  this  routine  will  be 
used  repeatedly  for  the  same  length,  sending  a  preoo^uted  $Ph  value 
Instead  of  a  length  would  reduce  overhcao  slightly. 


zsp325() 

< 


SUBROUTINB  RC0f«( 


sr3251nt  Coef^Length, 

zr 32 Slot  Out_Lengtb, 
zr326ref  Coefficients, 
zr325ref  ln_Data, 
zr325ref  OutJData) 


/*  set  up  mode  properly,  one  RAM  section,  24  bit  Integers  */ 

SET  t  -MMS,  -IXDR,  -IFKT  J; 

/*  set  $aAR  to  put  output  In  correct  place  */ 

LOR  Out.Data  ->  $8AR; 

/*  to  get  real  coefficients  In  zig>zag  order,  need  to  load  half 

as  many  (rounded  up)  "complex*  coefficients 

•/ 

SHLSETR: [SHIF7V17]  Coaf_Length  ->  SPR; 

AODR  $PR,  t0x020000; 


/*  load  coefficients  In  zig-zag  real  order  */ 

LD_C:($HMPT)  *Coerflcleote  ->  $C0; 


Peges 
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Oatai  7/30/92 


91  ~  /*  load  vaotor  langtlia  Into  paraaatar  raglatar 

92  -  $IINPT  -  Coaf.Laogth,  SKEPSAZ  -  Out.Langtli 

93  -  •/ 

9«  -  SaLSKTR;  [8EIFT>ia]  Coof.Length  ->  $FR; 

95  -  ADDR  $FR,  Out^Langtb; 

56  - 

57  -  /•  corralata  with  Input  aaquanca  */ 

58  •  /IR_R:($NMPT,$REPBAT)  $20,  *ln_Data; 

59  -  } 

60  -  »/ 

61  -  ) 


BiRCQRR. 


DaMi  7/20/92 


Flla:  BiBSCIF.ASH 


/*  Routlna  to  aat  up  raclprocal  tabla  and  ona  to  ganarata  Inllna  coda 

to  cc^uta  tha  taolpcooala  tor  a  vaotor.  Tba  algorttha  la  to  parfora 
a  tabla  lookup  to  gat  a  atartlng  aatlaata  and  than  parfoia  Mawton-Raphaon 
Itaratlona  until  accuracy  la  24  blta.  Thla  raqulraa  that 
ZkB_BITa  •  (1  «  injM_IIBR)  >-  24. 

Might  ba  battar  to  apllt  raclptab  Into  a  zap325  routlna  to  craata  tabla 
and  link  In  aftar  aasaably.  Tha  raclprocal  function  would  atlll  ba 
Inoludad  by  tha  ualng  routlna.  Thla  would  pravant  Including  tabla 
aora  than  onoa  If  It  la  uaad  by  aultlpla  othar  routlnaa. 


/•  daflna  nuabar  of  blta  of  accuracy  In  tabla,  tabla  alza,  and  Itaratlona  */ 

Idaflna  ZkB_BITa  6 

tdaflna  TkB_aiZE  (1  «  (ThB_BITB-l) ) 

tdaflna  HUM.ITSR  2 

/■  Function  to  craata  raclprocal  tabla  for  Initial  aatlaata.  Hunt  ba 
callad  onoa  If  raclprooala  ara  to  ba  uaad. 


-  void  raolptab( ) 


long  1; 


float  fit; 
long  blta; 

)  aax,  aln; 

/*  ganarati  label  for  atart  of  table  */ 

/I 

Racl[ffab>  i 

1/ 

ganarata  tha  tabla  antrlaa  •/ 
for  (1-0;  1  <  TkB_aiZB|  1*«) 

( 

/*  caloulata  aaa  and  aln  valuaa  that  will  uaa  thla  entry  */ 
aln. blta  -  (1271  «  23)  ♦  (1  «  (24  -  ThB_BXTS)); 
aax.blta  -  (1271  «  23)  ♦  ((1»1)  «  (24  -  TM_B1I8)); 

/•  uaa  aldpolnt  between  thalr  raclprooala  to  alnlalza  error  •/ 

/# 

.UAXA  (  IXB8_Float(0.5  /  aax. fit  *  0.5  /  aln. fit)  )  ; 

1/ 


Functluu  U>  produca  Inllna  aaaaaoly  to  calculate  tba  raclptocala  lot 
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( 


51  -  c  vector  la  latemol  HAM.  Tlis  Intamal  RAM  mat  be  set  up  to  have  two 

52  -  baoka  and  the  Input  vector  euat  be  In  HO.  Thla  lleita  the  Input  vector 

53  -  length  to  32  or  laoa.  The  reault  vector  enda  up  In  HO.  All  Internal 

54  -  HAM  banks  are  overwritten  with  Inteiaadlate  results. 

55  - 

56  -  This  function  la  essentiallv  a  eacro.  It  Is  called  free  within  a 

57  -  zap325()  function  and  genarataa  aaaaably  coda.  It  does  not  producs 

58  -  any  oalla  that  execute  at  run  ties.  The  function  xeclptab  euat  also 

54  -  have  bean  called  by  the  aap32S()  function  or  there  will  be  an  error 

60  -  during  aasiably. 

61  -  •/ 

62  -  void  roclpdnt  length) 

63  -  { 

64  -  Int  1; 

65  - 

66  -  /I 

67  -  /*  spilt  Into  exponent  and  aantlssa,  negate  exponent,  trap  zero  •/ 

68  -  SPL1T_R:(( length) }:[DV]  $R0  ->  $C1; 

69  - 

70  -  /•  look  up  Initial  aatleata  of  reciprocal  of  aantlssa  •/ 

71  -  LUT_R:((longth))!l3HIPT-(24-TAB_BITS)l  HeclpTab,  $11  ->  $10; 

72  - 

73  -  /•  change  sign  of  eatlaate  to  natch  Initial  Input  sign  */ 

74  •  8Icai_R:(  (length))  $80,  $10  •>  $80; 

75  -  #/ 

76  - 

77  -  /•  generate  Nawton-Hsphson  Iterations  Inline  */ 

78  -  for  (1  •  0;  1  <  mn^lTSR;  !+♦) 

79  -  ( 

80  -  /# 

91  -  /*  new  aatleata  •  astlaate  •  (2.0  -  astlaata  •  Input)  •/ 

92  -  SBN_Ri(( length))  $80,  $11,  *2.0  •>  $10; 

93  -  MUl.T_8i((langth))  $H0,  $10  ->  $H0; 

94  -  #/ 

85  -  ) 

86  - 

87  -  /* 

99  -  /*  racoablna  resulting  aantlssa  with  exponent  */ 

89  -  J0IH_H:( (length) )  $81,  $80  ->  $80; 

90  -  1/ 

91  -  ) 


1 


Oatai  7/20/92 


Flla:  B:KBCT2FaL.AaN 


1  -  /•  Boutlaa  to  parfoxa  ractangulax  to  polax  oonvaxaloo  on  a  coaplax  vactox. 

2  -  Uaaa  a  Coxdlo-llka  algoxltha  for  aagnltuda  and  an  arctangant  lookup 

3  -  tabla  fox  angla  In  xadlana.  Maxlaim  axxox  In  aagnltuda  la  2t  fox 

4  -  thru  Itaxatlona,  tihlch  can  aaslly  ba  laduoad  to  a  valoa  aa  low  aa 

5  -  0.0002%  by  Inoxaaslng  tba  nuabax  of  Itaxatlona  to  eight.  Maxlaua  axxox 

6  -  In  angla  la  2.33%  of  a  quadxant  (0.0366  xadlana)  for  6  blta  fioB  each 

7  -  aantlaaa,  which  xequlxaa  a  tabla  of  IK  antxlaa  for  flrat  quadrant  anglaa 

8  -  only.  Tba  tabla  alza  aunt  ba  quadrupled  fox  aacb  doubling  In  pxaclalon, 

9  -  no  this  approach  la  not  practical  fox  high  pxaclaloo. 

10  - 


11  -  Thla  pxograa  coaputaa  only  flrat  quadxant  angina,  other  angina  aca 

12  -  Bovad  Into  tha  flrat  quadrant  by  taking  tba  abaoluta  value  of  both 

13  -  eoaponanta.  Thla  aaana  that  tba  angla  will  ba  ooxraot  fox  tha  flrat 

14  -  quadrant,  aqual  to  pi  alnua  tha  true  angla  In  tha  aacond  quadxant, 

15  -  aqual  to  tba  true  angla  adnua  pi  In  tha  third  quadxant  and  aqual  to 

16  -  Blnua  tha  true  angla  In  tha  fourth  quadxant.  Thaaa  angina  are  tha 

17  -  abaoluta  valuaa  of  tba  angina  batwaan  tha  coaplax  nuabaxa  and  tha 

18  -  near eat  real  axla.  If  full  anglaa  axa  needed,  tha  tabla  can  juat  ba 

19  -  quadrupled  to  handle  algn  blta  In  tha  Index. 

20  - 

21  -  nia  vactox  length  la  paaaad  In  tha  paraoatar  Length.  Tha  paraaatar 

22  -  ln_Data  polnta  to  tha  vector  to  ba  convartad.  Tha  output  la  placed 

23  •  at  Out_Data.  Tha  convaxalon  can  be  pexfoxnad  In  place  If  daalxad. 

24  - 

25  -  •/ 

26  - 

27  •  /*  naad  aictangent  function  fox  tabla  */ 

28  •  llnoluda  <Bath.h> 

29  - 

30  -  /•  nuabax  of  blta  froa  aaoh  aantlaaa  to  ba  uaad  In  arctangent  tabla  lookup  */ 

31  -  tdaflna  TM.BITS  5 

32  - 

33  -  /*  nuabax  of  Coidlo  Itaxatlona  fox  aagnltuda  oaloulatlona  */ 

34  -  Idaflna  »Ma_ITEil  3 

35  - 

36  -  /*  function  to  xatum  arctangant  tabla  value  for  Index  nuabax  */ 

37  -  /*  only  handlaa  flrat  quadxant  anglaa,  but  could  ba  aodlflad  fox  all  four  */ 

38  -  float  tabantxy(lnt  1) 

39  -  ( 

40  -  Int  fblta(2I; 

41  -  Int  part; 

42  -  Int  Indax; 

43  - 

44  -  /*  detazalna  nuabaxa  that  would  have  produced  tha  glvan  Indax  */ 

45  -  fox  (part  ■  0;  part  <-  1;  partt«) 

46  -  { 

47  -  /•  aatract  Intaxlaavad  aantlaaa  blta  fxca  Indaa  */ 

48  -  fblta(part)  -  0; 

49  -  fox  (Indaa  •  0;  Indax  <  ThB_BITB]  Indaxt*) 

50  -  { 
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5X  - 

fbltalpart]  |-  (1  «  IndoK)  4  (1  »  indax  t  part); 

52  - 

) 

53  - 

) 

54  - 

55  - 

/• 

ratum  alddla  angla  of  tba  poaalbla  ranga  •/ 

56  - 

xatuzn  (ataD2( (doubla)  fbitaCO]  *  1,  (doubla)  fblta(l])  * 

57  - 

atan2( (doubla)  fblta[0].  (doubla)  fbltatl]  t  1))  /  2.0; 

58  -  ) 

59  - 

60  - 

61  -  /• 

ACtiMl  assasbly  genaration  function  */ 

62  -  zap32S() 

63  -  ( 

64  - 

int 

Indax; 

65  - 

66  - 

/• 

danarata  arctangant  tabla.  Bacauaa  of  noraallxatlon,  only  flxat 

67  - 

antry  and  laat  thraa  quartara  of  tabla  ara  actually  uaad. 

68  - 

•/ 

69  - 

/f 

70  - 

AtanTabi t 

71  - 

8/ 

72  - 

lor 

(Indax  a  0;  Indax  <  (1  «  XAB_B1TS*2);  Indmx**) 

73  - 

{ 

74  - 

/I 

75  - 

.DATA  <  (IB8E_Float(tabentry(lndax)))  >; 

76  - 

#/ 

77  - 

i 

78  - 

79  - 

/I 

80  - 

SUBROUTINE  RECT2P0L( zr32Sint  Langtii,  zr32Sri8f  In.Data,  sr325raf  Out_Data) 

81  - 

< 

82  - 

83  - 

/*  aat  up  ttro  RAN  aactlona,  awapping  on  aacb  loop  Itaratloo  */ 

84  - 

SIT  (  -IMS,  -XOR  ]; 

85  - 

86  - 

/*  load  data  polntara,  paraaatar  ordar  gata  lD_Data  into  $A  */ 

87  - 

IDR  Out.Data  ->  [$B,  SA); 

88  - 

89  - 

/*  Inltlallxa  loop  count  to  nuabar  of  32a,  aklp  loop  if  nona  •/ 

90  - 

BHRSITR<[aBIPT>5]  Langtta  ->  SLC; 

91  - 

JMPC  (ZR),  Do_Raat; 

92  - 

93  - 

/*  flrat  part  of  loop  to  fill  aoftwara  plpallna  ■/ 

94  - 

95  - 

/*  load  to  bank  1,  taka  abaoluta  «alua  to  put  in  flrat  quadrant 

96  - 

lD_||l(32)  »A  ->  »C1| 

97  - 

/*  align  ■antlaaaa  and  intarlaava  to  craata  atan  indax  in  $10  *, 

90  - 

ALICMi(32)  $IU,  $11  ->  $10; 

99  - 

/*  do  oordlo  Itaratlona  to  got  aagnltiido  in  $R1,  takaa  a  wtalla  ' 

100  - 

NA0:(33,NAa_lTBI)  $C1; 
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101  - 
102  - 

103  - 

104  - 

105  - 

106  - 

107  - 

108  - 
100  - 
110  - 

111  -  Loopii 

112  - 

113  - 

114  - 

115  - 

116  - 

117  - 

118  - 

119  - 

120  - 
121  - 

122  -  Do_ator«t  t 

123  - 

124  - 

125  - 

126  - 

127  -  OoJtMtii 

128  - 

129  - 

130  - 

131  - 

132  * 

133  - 

134  - 

135  - 

136  - 

137  - 

138  - 

139  - 

140  - 

141  - 

142  - 

143  - 

144  - 

145  -  Bndii 

146  - 

147  -  } 

144  -  «/ 

149  -  ) 


/•  look  up  arcungent  In  tafala,  ovarlnpa  idtli  MAO  */ 

/*  aDctea  «1  la  bacausa  of  ttia  algn  blta  tactanlcally  Includad  */ 

UJTi(32)t[SBIVT-(23  -  2*(TAB_BITa*l))]  AtattTab,  $10  •>  $10; 

/*  atora  angla,  ovarlapa  with  NAO  */ 

aT_I>(32)  $10  ->  $Bt-l!(2,l); 

/*  daoraaant  $I.C,  and  loop  If  dona  */ 

OMPCtlDL]  [LZ],  Do_Stora; 

/*  aoftwara  plpallnad  loop,  allowa  noxt  load  to  ovarlap  MAS  */ 

lJ)_J!i(32J  $A«>e4  ->  $C1; 

/*  atora  aagnltuda  froa  pravloua  vactor  */ 

8T_R:(32)  $R0  •>  $B-1:(2,1); 

ALiaH:(32)  $R1,  $11  ->  $10; 

NAa:(32,MAO_rrER)  $C1; 

LUT:(32):(SBIFT-(23  -  2*(TAB_BITS*1))J  Atanlab,  $10  ->  $10; 
aT_Ii(32)  $10  ->  $B«-64i(2,l); 

/■  dacraaant  eountar  and  branch  to  top  If  not  dona  */ 

JMPCt[DL]  [ILZ],  Loop; 


/•  raat  of  loop  to  aaipty  aafttnua  plpallna  */ 

/*  atora  aagnltuda  froa  laat  vactor  */ 

8r_Ri(32)  $R0  •>  $B-1:(2,1); 


/■  handle  raaalndar  left  after  blocks  of  32  •/ 

/•  ablft  raaalndar  Into  $«KFT,  usa  ITC]  to  laro  hlgb  bit  */ 

8KLSZTRI  [SBIPr-16,3C]  Langth  •>  $FR; 

■JMK  [ZR],  Bnd; 

/*  naad  MAO.ITKR  In  $RBFBAX  to  uaa  $FR  with  MA0  •/ 

ADDR  $PR,  INAa_mR; 

/*  finish  up  raaalndar  */ 

LD_;|i($mPT)  $Af64  ->  $C1; 

ALiaili($IN>T)  $R1,  $11  ->  $10; 

MAOl{$HMFT,$RBPBAT)  $<;l; 

LOT:  ($HNFT):  [8HIPT-(23  -  2*(TAB_BITa«l))]  AtanTab,  $10  ->  $10; 
8T_Is($RMi>T}  $10  »  $B*«64)(2,1); 
er_Rj($K>^)  $R1  ->  $B-li(2,l); 
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X  •  /*  Tut  progru  to  au  If  Zorua  %nrk  */ 

2  - 

3  -  /*  atoaoluto  bua  addruua  tzom  laanry  up  */ 

4  -  idaflns  PRAM  0x00000 

5  -  fdaflna  POUR.PORT  0x20000 

6  -  Idaflna  ffUIU8_LKICB  0x40000 

7  - 

a  >  tlDCluda  ■raolp.aaa* 

9  • 

10  •  up329() 

11  -  < 

12  -  Int  1; 

13  -  float  x; 

14  - 

15  -  /•  Mt  up  raclptocal  atartlng  tabla  */ 

16  -  raclptat)( ) ; 

17  - 

18  •  /I 

19  -  .OM  0 

20  -  BUBROunn  naihc  ) 

21  -  < 

22  •  /*  ut  up  two  MM  aactiona  */ 

23  -  BBT  [•IMMS,  -IXDRl; 

24  -  U)_R:(16)  P0UR_PQKr  ->  SRO; 

25  •  1/ 

26  -  taclp(16); 

27  -  /# 

28  -  flT_Bl(16)  SRO  ■>  POOR_POBr; 

29  -  ) 

30  -  «/ 

31  -  ) 
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1  -  /*  Initialization  for  ayntbatlc  atack  fraaaa  for  FFTSOIO. 

2  -  rataokM  and  oatackM  ara  atartlng  %St  valuaa  for  chip  a. 

3  -  Ona  fraa  apot  laft  in  atacka  for  Intarrupt. 

«  -  */ 

5  - 

6  -  /*  abaoluta  baaa  addraaaaa  froa  aanory  aap  */ 

7  -  Idaflna  PRAM  0x00000 

8  -  Idaflna  nxiR_PORr  0x20000 

9  -  Idaflna  8S«SU8_LATCH  0x40000 
10  - 

11  -  Idaflna  OOLUMa(H)  (FCiUR_J>oier  *  (H)  •  2) 

12  -  Idaflna  lOW(H)  (FOUR_PCaKI  *  (H)  •  32) 

13  - 

14  -  zap325() 

15  -  { 

16  -  /I 

17  - 

18  -  BUBROunas  riBiaao; 

19  •  SUBROUnaK  Fm6COL(zr32Sraf  In.Oata,  zr32Sraf  OutJ)ata)| 

20  -  BUBROUTIRB  Fm6ROW( Brl2Srof  Xi\_0ata,  zr32Sraf  Outjlata); 

21  - 

22  -  .BcraRa  jiubantry.rzaiaa 

23  -  .aXTKRa  _aubRntry_FFT16C0lL 

24  •  .RXIKRa  _aul>Rntry_PFT16Raw 

25  - 

26  • 


27  - 

atJUCKBx 

28  - 

.OAXA 

< 

0  ); 

29  - 

catackO: t 

la* 

o 

1 

.DAXA 

< 

0  ); 

31  - 

.DMA 

( 

4_SubBntry_FFT1600L  ); 

32  - 

.DAXA 

{ 

4_8utCotry_PiaiBB  ); 

33  - 

.DAXA 

( 

OOUINI(O)  ); 

34  - 

.DAXA 

OOLUI«I(0)  >; 

35  - 

oataokli ■ 

36  - 

.DAXA 

i 

0  )> 

37  - 

.DAXA 

< 

4_aubBntry_mieC0L  )I 

36  - 

.DAXA 

< 

A_auMntry_riai8H  » 

39  - 

.DAXA 

< 

0OUMI(4)  )j 

40  - 

.DAXA 

i 

CX>UMi(4)  ); 

41  - 

oataak2t i 

42  - 

.DAXA 

< 

0  )J 

43  - 

.DMA 

< 

C_8ullBntrY_FR1600L  ); 

44  - 

.man 

A_aul]aintry_piaiaH  >; 

45  - 

.DAXA 

( 

OOUMI(a)  ); 

46  - 

.DAXA 

( 

OOLUM(a) 

47  - 

oatack3i 1 

46  - 

.DAXA 

{ 

0  ); 

40  - 

.DAXA 

( 

A_aubBntZY_rR1600L  ); 

50  - 

.DMA 

( 

A_aiDaBntzy_PiaiaB  )> 
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51 

- 

.DASA 

{  COI2JMN(12)  >; 

52 

- 

.DAXA 

< 

OOLUMI(12)  ); 

53 

- 

ratackOi ■ 

54 

- 

.DAXA 

{ 

0  }; 

55 

- 

.DAXA 

C.SutSntxy.PFTieiUW 

}; 

56 

- 

•DATA 

< 

«_8iitaBntty_FlllISB  >; 

57 

- 

.DAXA 

< 

«0W{0)  ), 

56 

- 

.DAXA  < 

i»H(0)  >) 

56 

- 

rstAoklii 

60 

- 

.DAXA 

0  » 

61 

- 

.DAXA 

{ 

C_8ub8ntzy_FR16IOW 

>J 

62 

- 

•DATA 

{ 

s_8ui>Bntty_FlNia8  ); 

63 

- 

.DAXA 

< 

KJW(4)  ), 

64 

- 

.DATA 

ROWti)  ); 

65 

- 

ntack2; : 

66 

- 

.DAXA 

( 

0  H 

67 

- 

.DATA 

< 

C_SutiBntzy_PFTienoW 

}; 

68 

- 

.DATA 

{ 

c_aubBntry_FlHiaB  >; 

69 

- 

.DAXA 

1 

809(8)  ); 

70 

- 

•DATA 

{ 

809(8)  )» 

71 

- 

rataek3i t 

72 

- 

.DAXA 

1 

0  >1 

73 

- 

.DAXA 

< 

t_Babmnzyjmn(M  )i 

74 

- 

.DATA 

< 

C_8ub8ntry_FUIiaB  ); 

75 

- 

.DAXA 

( 

809(12)  >; 

76 

- 

.DATA 

( 

809(12)  ); 

77 

- 

*/ 

76 

- 

) 
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fiia:  Biaaaa.Am 


1  -  /»  Inltlallxaticn  for  •ynthotlc  stack  fraMs  for  PPT2032 

2  -  rataok*  and  oataokM  ara  starting  $BP  valuas  for  obip  ■. 

3  -  Ona  fras  spot  loft  In  stacks  for  Intarrupt. 

4  -  */ 

5  - 

6-/0  abaoluta  base  addrasaas  froa  aaury  asp  •/ 

7  -  Idaflna  PRM  0x00000 

8  -  fdaflno  FOUR.PORT  0x20000 

9  -  Idaflna  aXklUS.UICB  0x40000 
10  - 

11  -  Idaflna  OOLUMI(H)  (F008_POIIT  ♦  («)  *  2) 

12  -  Idaflna  Xitl(ll)  (IO0H_P0l«T  ♦  (9)  •  64) 

13  - 

14  -  ssp325() 

15  -  ( 

16  -  /I 

17  - 

18  -  SUBBOUnn  FUISBO; 

19  -  SUBROUnilB  PPT32CJJL(*r325rsf  In_Data,  *r32Sraf  Out_Data); 

20  -  SUBROUnHK  rrr32I)OW(xr325raf  In_Data<  tr326raf  Out_Data); 

21  - 

22  -  .BXTBRH  _aubBntry_FIRiaB 

23  -  .kXnaui  _8ubSntry_FFT32Cat. 

24  -  .BXXBIW  _aub8ntry_FFT32»W 

25  - 

26  - 

27  -  BXftCBSlI 

28  -  .DMA  <  0  )) 

29  -  ostaokOll 


30  - 

aOASA 

( 

0  )l 

31  - 

aUVCA 

{ 

4_8ublntry_FFT320ai  ); 

32  - 

sDAXA 

< 

4_8ubBntry_FniI8H  ); 

33  - 

.DATA 

aOLUMI(O)  >1 

34  • 

sDAXA 

( 

oountKO)  }i 

35  - 

ostaokli 1 

36  - 

.0«SA 

< 

0  ); 

37  - 

.DACA 

( 

4_8uhBntry_FFT3200L  ); 

3B  - 

.DATA 

S_Sut]Sntry_FIHI8B  ); 

39  - 

.DATA 

< 

00LUMI(8)  ); 

40  - 

.DATA 

00UJMI(8)  ); 

41  - 

catAcA2s : 

42  - 

.DATA 

( 

0  ); 

43  - 

.DATA 

( 

*_SubBnlry_FFT3200I>  }; 

44  - 

•DATA 

( 

C_SutaBntry_FIB18B  ); 

45  - 

.DATA 

< 

00LUMI(16)  >; 

46  - 

.DMA  <  aauMi(ie)  >j 

47  . 

a8tAoA3f 1 

46  - 

•DATA 

0  ); 

49  - 

.DATA 

< 

«_SutaBntrY_FFT3200L  )l 

50  • 

.DATA 

< 

4_aubBntry^FIIIISB  )i 
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51  -  .DAZA  {  OOUJMI(24)  >; 

52  -  .DAXA  {  OOI2JMI(24)  >; 

B3  -  cataokOii 

54  -  .DAXA  {  0  ); 

55  -  .DAXA  (  *_8ul>Bntry_FST32BDW  }; 

56  -  .DAXA  <  C_SubBntzy_FINISB  >; 

57  -  .DATA  <  iOH(O)  ); 

56  -  .DAXA  {  101(0)  >; 

59  -  rataoklii 

60  -  .DAXA  {  0  ); 

61  -  .DAXA  (  6_8uhBntry_PFX32H0W  }; 

62  -  .DAXA  {  A_8utiBatiy_FIllI8H  >; 

63  -  .DAXA  {  !Of(e)  ); 

64  -  .DAXA  {  l)OW(a)  >; 

65  -  ntACk2i  i 

66  -  .DAXA  {  0  ); 

67  -  .DAXA  {  S_8ubBiitry_FFX3280W  ); 

68  -  .DAXA  (  A_SutiEntry_FIiri8B  }; 

69  -  .DAXA  <  101(16)  >) 

70  -  .DAXA  (  10r(16)  )/ 

71  -  ntaoAlil 

72  -  .DAXA  {  0  » 

73  -  .DAXA  (  «_8ubBntry_FFT32IOI  ); 

74  -  .DAXA  {  »_8ul>Entry_FIIIiaB  ); 

75  -  .DAXA  {  IOr(24)  >) 

76  -  .DAXA  (  101(24)  ); 

77  -  1/ 

78  -  ) 
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1 

2 

3 

4 

5 

6 
7 

e 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 


/*  iBltial  liatloD  for  ayiithatlc  stack  traaaa  for  FFTIK 

rstaakii  and  aataoks  ars  starting  $ap  valuaa  for  cblp  M. 
Ooa  frsa  spot  laft  In  stacks  for  intarrupt. 

•/ 

/>  absoluta  bass  addraaaas  froa  aaaory  aap  •/ 
tdaflna  PRIM  0x00000 
Idaflna  FOUR.POdI  0x20000 
Idaflns  aZKTUS.UTCS  0x40000 

tdaflna  OOLUMH(H)  (POUR.FORT  +  (H)  *  2) 

Idaflna  RaH(H)  (FOUR.PORT  *  (H)  •  64) 
tdaflna  OUT.OFFSRI  0x800 

tap325() 

{ 

/t 

SUBROUTIKB  PiaiSBO; 

SUBROUniOt  FFT3200L(  xr325raf  In_Data,  xr325raf  Out_Oata)> 
SUBROUnilS  FFTlK0(sr325raf  In_Data,  xr325raf  Out_Data); 
RUBROUTIKE  FfTlRl(ar325raf  In.Oata,  xr32Sraf  Out_Oata); 
SUBROUniat  FniK2(sr32Sraf  In.Data,  xr325rof  Out.Data); 
aUBKXmn  FFTlR3(sr32Srof  In.Data,  xr32Sraf  Out.Data); 

.BCTSRH  _auiiSntry_FINI8a 
.nmaw  _Bui>sntry_FFT32caL 
.BcmiR  _aub>ntry_FFTlK0 
.KXTBRR  _8utiRntry_FFTlKl 
.BXTRRR  _8ubRntry_FFTlK2 
.Bxnaw  _8ul>Sntry_FFTlK3 

aiACMii 

.OaXA  {  0  ); 
ostackO: : 

.D8X*  {  0  ); 

.DAXk  <  C_SuliBntry_FFT32CX)L  ); 

.DMA  <  4_8ubBntry_FIHI8B  ); 

•  DAXA  <  OOLI)MH(O)  ); 

.DAXA  <  COUIMN(O)  ); 
ostaoklit 

.OSSA  <  0  ); 

.DAXA  <  t_Sul)Bntry_FFT32aOI.  >; 

.DAXA  <  C_8ubBntry_FIIII8B  » 

.DAXA  {  00LIIMI(8)  >; 

.DAXA  <  00LUM(8)  )) 
aataek2i i 

.DAXA  {  0  >; 

.DAXA  <  «_Sui>Bntry_FFT32a0L  ); 

.DAXA  <  C_aiibBntry_FIBISB  ); 
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File:  B:8IACK3.AflM 


51  - 

.DAI*  (  00LUHH(16)  )l 

52  - 

.DATA  {  OOLUMH(16)  >; 

53  -  ostack3)t 

54  - 

DAIA  <  0  ); 

55  - 

•DATA  {  *_8ubEntty_Pra32COL  ); 

56  - 

.DATA  <  t_BubSatry_yillI8B  >; 

57  - 

.DATA  {  COLUI«(24)  ); 

58  • 

.DATA  <  COLU>«(24)  ); 

59  -  ratackO:: 

60  . 

.OAXA  (  0  >; 

61  - 

.DATA  {  *_Sut>Bntry_FPTlKO  ); 

62  - 

.DATA  <  4_Sul>8Dtry_FIRlSH  >; 

63  • 

.DAI*  {  (CQi.uMM(0)  *  QUTjjFrasT)  ); 

64  - 

.DAIA  <  R«r(0)  }j 

65  - 

ntackli  > 

66  - 

.DATA  {  0  ); 

67  - 

.DATA  (  *_8ul>Sntry_?FTlKl  >< 

68  - 

.DATA  <  *_8uliBntry_FIllISH  >; 

69  - 

.DAIA  {  (CCa.l8«(8)  ♦  0UT_0FP8BT)  it 

70  - 

.DAI*  {  RM(8)  it 

71  - 

ratack2<i 

72  - 

.DATA  {  0  ); 

73  - 

.DATA  {  6_8uhEntry_FFIlK2  ); 

74  - 

.DATA  {  6_8ub8ntry_FIHiaB  )) 

75  - 

.DAT*  (  (CQI,UKH(16)  ♦  OUT_QFFSET)  it 

76  - 

.DAI*  <  BOW(16)  ); 

77  - 

ratackJi i 

78  - 

.DATA  <  0  ); 

79  - 

.DATA  {  4_auMntry_FFTlK3  it 

80  • 

.DATA  <  a_8ubBntry_FIIllBB  >; 

81  - 

.DATA  {  (CQLU»BI(24)  ♦  0MT_(WF8«r)  ); 

82  - 

.DAI*  (  iKW(24)  X 

83  • 

1/ 

84  - 

) 

Data!  7/20/92 


Plla:  B:BSACK4.AflM 


-  /*  Inltlallutlco  for  aynthatlc  stack  fra 

-  stack  la  atartlog  $89  valua. 

-  •/ 

-  /*  abaoluta  baaa  addraaaaa  froa 

-  fdaflna  PRM  0x00000 

-  fdaflna  P0UH_PC»T  0x20000 

-  fdaflna  aXKTUBJJVTCB  0x40000 


for  HCOMV. 


ap  */ 


10 

- 

fdaflna  C0SP_LBM  8 

11 

- 

fdaflna  CUT_iai  25 

12 

- 

13 

- 

xsp32S() 

14 

- 

15 

- 

/f 

16 

- 

17 

- 

•UBROUTZIII  fXiriSBOl 

18 

- 

•uaMUnia  ilOOIIV(  ar3261nt  Coaf.langth, 

IS 

- 

tr32Sint 

20 

- 

tr32Sraf 

21 

- 

sr32Sraf 

22 

- 

ar32Sraf 

23 

- 

24 

- 

•fiCI'JUUI  ^8ub8ntry_FZin3B 

25 

- 

.Kcnaui  _aub£ntry.nccHv 

26 

- 

27 

- 

26 

- 

/• 

daflna  atack  wltb  paraaatara  In 

ravaraa  i 

29 

- 

8T8CKBs> 

30 

- 

.MXA  {  0  ); 

31 

- 

ataoksi 

32 

- 

.DAZA  <  0  >; 

33 

- 

.DAZA  {  C.SubBntzy.ROOIIV  ); 

34 

- 

aOAZA  <  ft.Sublatry.rnilSB  >; 

35 

- 

.DAZA  <  (raUll_FORr  ♦  OxlOOl  >; 

36 

- 

aOAZA  {  (rOUR.POAT  *  COBrjLBM} 

); 

37 

- 

.DMCA  {  roUR^PORZ  )| 

38 

- 

.DAZA  {  OUT.LBN  >; 

39 

- 

•DAZA  {  OOBF^LBN  >; 

40 

- 

f/ 

41 

- 

) 

Pagai  1 


Fila:  BiaTACKS.ABM 


I 

Data:  7/20/92 


I 

! 

I 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 


/*  Inltlalltatloa  for  ayntbatlc  atack  fraaM  tor  CCQHV. 
atack  la  atartlng  $aP  valua. 

•/ 

/*  abaoluta  baaa  addraaaaa  froa  Baaory  aap  */ 
fdaflna  PfMN  0x00000 
Idafloa  KmjOKt  0x20000 
tdaflaa  BS»TI»_LMtCH  0x40000 

tdaflna  00BF_Ln  4 
Idaflaa  OUT.LSN  13 

xap325() 

{ 

/# 

SUBBOUnilB  PIKISHO; 

SUBBOUIin  CaoaV(  zr3251ot  Coaf.Langth, 

xr3251nt  Out_Langtb, 
ar325raf  Coaffiolanta, 
xr325raf  In_Data, 
xrl25rat  Out_Data); 


.sxnam  _BubBntry_PIHiaB 
.SXTBRH  _8ub8ntry_CCaHV 


/•  daflaa  atack  wltb  paraaatara  In  ravaraa  oirdar  */ 
SXbCUit 

.DMA  <  0  >; 
atack: : 

.DAXk  <  0  ); 

.08X8  (  4_8ubBntrY_caO)IV  >; 

.08X8  <  4_BubBntry_FaiiaB  ); 

.D8X8  <  (P0UB_P0Rt  *  0x100)  ); 

.08X8  {  iFoimjeosT  *  ccbf_ub*2)  >; 

.D8X8  {  POUB.FCRT  ); 

.08X8  <  0UT_LBII  )l 

.08X8  {  CX»F_UII  ); 
t/ 

> 


Pagai  1 


I 


Datat  7/20/92 


Fllai  B:8XACKe.MM 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 
39 

36 

37 

38 

39 

40 

41 


/*  laltlallutloo  for  ■ynthatlc  atack  fraaaa  for  RCOBB. 
ataok  la  atartlng  $8P  valua. 

•/ 

/*  abaoluta  baaa  addraaaaa  frca  aaaary  aap  */ 
tdaflna  FBAM  0x00000 
fdafioa  FOUR^POKT  0x20000 
Idafloa  8X»SU8_LKrCB  0x40000 

Idaflna  aaKPJLBR  8 
Idaflsa  aUT_iai  25 

aap325( ) 

{ 

/* 

suBROunia  FntisBO; 

aUBROUniE  KXimi  xrlismt  Coaf.Langth, 

xr3251nt  Out.Laogth, 
sr32Sraf  Coafficlanta, 
xr32Sraf  Io_Data, 
xr32Sraf  OutJOata); 


.BXrSMI  _Sut>Bntry_FIRISH 
.KXXBM  _aut>SotrY_)kX«B 


/*  daflna  stack  with  paraaatara  In  ravaraa  ordar  */ 
OTACKSIt 

.D«n  (  0  ); 

ataokii 

.MXA  <  0  )| 

.OKU  (  4_auUntry_iiooiai  >> 

.OAXk  <  4_8ut«otry.FI>I8B  }y 
.OKU  {  (FOUR.FOiar  *  OxlOO)  )t 
.OKU  <  (FaUR_FORr  t  COBF.UM)  ); 

.DATA  {  FOUH.PORT  ); 

.DMA  {  OUT.LBN  ); 

.OKU  <  OOKF_Uir  ); 

»/ 

> 


tagai 


1 


Dates 


7/20/92 


Flla:  S:8ZSCK7.AaM 


1  -  /*  Inltlalliatlon  tor  aynttiatlc  atack  fraaa  for  cooiot. 

2  -  ataok  la  atartlog  $SF  valua. 

3  -  •/ 

4  > 

5  -  /*  abaoluta  baaa  addraaaaa  frca  naory  aap  */ 

6  -  fdatlM  PMM  0x00000 

7  >  fdatioa  VOim.PORT  0x20000 

8  fdaflM  ■Xiai)8_LMrCH  0x40000 

9  > 

10  '  Idaflna  OOSF.UH  4 

11  '  tdaflisa  anjUSK  13 

12  - 

13  -  up325() 

14  -  { 

15  -  /# 

16  • 

17  -  aUBROUTIHB  miSBO; 

18  -  SUBHOOniat  COOMK  irllSlnt  Coaf.Langtti, 

jg  .  1x3251111  Out.Langtls, 

30  .  ar325raf  Coaftlolante, 

21  .  xr325raf  Xn_DaU, 

23  .  ar329raf  OutJ>ate); 

23  - 

24  •  .Bcma  _8uiiBntry_riinaB 

25  -  .KXXBW  _Bub8ntry_CCXIi« 

26  - 

27  - 

28  >  /•  daflna  atack  with  paraaatara  In  ravaraa  ordar  */ 

29  •  SSACUss 

30  -  .OKU  <  0  >; 

31  -  atack: I 

32  -  .OKU  {  0  ); 

33  -  .OJITA  {  «_8iih«ntry_CeoiW  ); 

34  -  .D«XA  <  6_SuliBiitry_Piai8B  ); 

35  >  .DAXA  <  (POm_PO*S  «  0x100)  )> 

36  -  .DMU  <  (poim.poia  *  ca0_Ln*3)  >> 

37  -  .04X8  (  fOUil_PaMr  ); 

38  >  .D8ZA  {  onLLn  ); 

39  '  .DAXA  {  COSr.LBI  ); 

40  -  */ 

41  -  ) 


Fagai 


Data!  7/20/92 


Plla:  BsKOCU.tm 


1  -  /*  iDltlalliatlon  for  synthatlc  stack  fraaa  for  POLTSSCT. 

2  -  stack  la  starting  $8P  valua. 

3  -  •/ 

4  - 

5  -  /*  abaoluta  baaa  addraaaas  froa  naanry  aap  */ 

6  -  tdaflna  PMM  0x00000 

7  -  Idaflna  POIIR_PCRT  0x20000 

6  -  Idaflna  aiklUSJUtrCB  0x40000 
9  - 

10  -  tdaflna  UMIB  200 

11  - 

12  -  up325() 

13  -  ( 

14  -  /# 

15  - 

16  -  BUBBCmim  FaL2IISCT( sr32Slnt  Langth,  zr32Sraf  In_Data.  xr32Sraf  Out.Data); 

17  -  SUBROUTIKB  POilSat ) ; 

18  - 

19  -  .EXEBRa  _aubBntry_FaL2RBCr 

20  -  .BXESiUI  jBubBntry.PlHlSB 

21  - 
22  - 

23  •  /*  daflna  ataok  with  paraaatara  In  ravarsa  ordar  */ 

24  -  aiACKlM 

25  •  .DNEh  <  0  >> 

26  •  staokt I 

27  -  .DUh  <  0  >1 

28  >  .MXh  (  4_auhBotzy_Pai.2RBCX  ); 

29  -  .DXEA  <  t_8uhBotry_PIBI8B  ); 

30  -  .OKEA  <  POUR.PORE  }; 

31  -  .OKEA  <  ramjKxa  >; 

32  -  .DAXA  <  LBMIB  ); 

33  -  1/ 

34  -  ) 

39  - 


Pagai 


1 


Datai  ma/n 


riiat  Biaxkcn.AaM 


1  -  /*  Initialisation  for  aynthatlo  atack  fraaa  for  MLZItlCT. 

2  -  stack  la  starting  SAP  valua. 

3  -  •/ 

4  - 

5  -  /*  abaoluts  baas  addrassas  froa  aaaory  nap  ■/ 

6  -  Idaflna  PMM  0x00000 

7  -  fdaflna  roUR.FORr  0x20000 

a  -  Idaflna  SI9aUS_L»XCB  0x40000 
9  - 

10  -  Idaflna  UmoiB  200 

11  - 

12  -  sap32S() 

IJ  -  i 

14  -  /I 

15  - 

16  -  miBAOUniB  itBCT2POI.(Br32Slnt  langth,  sr325raf  In_Data,  xr325raf  (kit_Data)> 

17  -  suaminaB  rnisHO; 
la  - 

19  -  .BCmui  _BubBntry_iBCT2POL 

20  -  .ixrBiui  _aubBntry_riMxaa 

21  - 
22  - 

23  -  /*  daflna  atack  wltb  paraastara  In  ravaraa  ordar  •/ 


24  - 

flXACMtS 

25  - 

.DAXA 

< 

0  ); 

26  - 

27  - 

.DAXA 

< 

0  ); 

21  - 

.MZA 

< 

6_8ubB0tIY_aKT2F0I.  >; 

29  - 

aDAZA 

< 

4_8ubBntrY_raiI8B  ); 

1 

o 

•  DKEA 

< 

fOUI»_PC8«T  }» 

31  - 

.MZA 

i 

pouii_paRr  >; 

32  - 

.MZA 

i 

LBNIB  )| 

33  -  1/ 

34  -  ) 


Pagai  1 


I 


Datai  7/20/92 


Filas  BiBEMaS.AOM 


Inltlallsatlco  for  ■yntbatlc  stack  fraaaa  for  FFTIX  banctaaark 
staoka  la  starting  fSF  valua  for  chip  H. 

■xaoutlon  aaquanoa  Is  to  start  at  BTAinUP,  ratum  to  do  aoluaoa, 
ratum  to  syncbronlza  for  sacond  wava,  pop  paraaatars,  ratum 
to  do  rows,  than  raturn  to  finish. 

Ona  fraa  spot  laft  In  stacks  for  Intarrupt. 

absoluta  basa  addraaaas  tram  aaaory  aap  */ 

10  -  fdaflna  PRm  0x00000 

11  -  fdaflna  FOUR.PORT  0x20000 

12  -  fdaflna  aiKniS_lAICH  0x40000 

13  • 

14  •  fdaflna  OOLUMI(a)  (FOtlR.FOKT  «  (H)  *  2) 

15  -  fdaflna  HOMiH)  (FOUR.FORI  *  (N)  ‘  64) 

16  -  fdaflna  aUT_CFFSBX  OxBOO 

17  - 

18  -  ssp325() 

19  -  { 

20  -  /f 

21  - 

22  -  MimouTza  riMiaaoi 

23  -  ■UBRaunRB  flYMCBROMiaO; 

24  •  auaHOUTIIIB  rrT32aoi.(sr325raf  In.Oata,  sr335raf  Out.Oata); 

25  -  auBKWTXU  rFTlK0(sr325raf  In.Oata,  u32Sraf  out.Data); 

26  -  BUBBOmiU  miKl(zr325raf  In.Data,  sr32Sraf  Out_DBta}; 

27  -  SUBKWTin  nTlX2(sr325raf  In.Oata,  xr325raf  Out.Data); 

28  -  SUBMUnu  rrriK3(tr325rat  In.Data,  zr32Sraf  Out.Data); 

29  - 

30  -  .BXTERM 

31  -  .Bxnsw 

32  -  .lXTBi« 

33  -  .Boam 

34  -  .Bcrsni 

35  -  .KXTBiUI 

36  •  .SXrBW 

37  - 

38  -  aXhCUll 

39  -  .OfZh  (  0  >; 

40  -  staokOii 

41  -  .nm  <  0  ); 

42  -  .nm  {  «_SiifaKntrr_rFT3200Ii  >; 

43  -  .OhZh  <  6_8ubBntzy_aYIICHIIDHIB  ); 

44  -  ,0KtA  {  aOLIIMH(O)  >; 

45  -  .OhXh  <  OOLUMI(O)  )) 

46  -  .OKEK  {  i.tnbnatzyjnrilCO  » 

47  -  .nUA  <  4_Bnblntzy_FI>Isa  ); 

48  -  .DAXh  (  (CCI.IJMH(0)  *  OUIjOFrsaT)  >; 

49  -  .DASA  {  MM(0)  ); 

50  -  staoklii 


aubBntry_Piai3H 

aubBntry.aniCHiailZB 

aabBntry_rPT320aL 

_aubBntry_FFTlK0 

aubantry_FFTllU 

Bubantry.FFTlia 

BabBntry.ITTlU 


ragai 


1 


O^tmt  7/30/92 


Pilsi  BiaSACKB.AflN 


I 

1 


51  - 

.MXft  <  0  >; 

52  - 

.06X6  {  6_8abBntiy_FFT3200L  }; 

53  - 

.06X6  {  6_SubBatiy_sniCBlCIIIZB  >; 

54  - 

.06X6  <  COLUiai(8)  ); 

55  - 

.06X6  {  OQLUMI(e)  >; 

56  - 

.06X6  {  6_aubBatxyJPrciKl  ); 

57  - 

.06X6  <  6_8ubBi>txy_rillI8B  )> 

56  - 

.06X6  <  (C01.U«N(8>  *  0UT_0?9a8X) 

>> 

59  - 

.06X6  1  liaM(8)  )) 

60  -  staakaii 

61  - 

.06X6  {  0  >1 

62  - 

.06X6  {  6_8ul)Bntzy_r?X3200L  }; 

63  - 

.06X6  {  6_aal)Bntry_8YIICHnOMIZB  >; 

64  - 

.06X6  {  COLUM(16)  t> 

65  - 

.06X6  {  COUM((16)  ); 

66  - 

.06X6  <  4_8uOBjitiy_FmK2  ); 

67  - 

.06X6  <  6_aubBotxy_PI8I8B  >; 

68  - 

.06X6  <  (caLUMH(16)  *  ailT_apP8BT) 

); 

69  - 

.06X6  {  I)CM(16)  >; 

70  -  ataoUii 

71  - 

.06X6  <  0  >; 

72  - 

.06X6  {  C_8ub8ntxy_Fra32a0L  }f 

73  • 

.06X6  {  «_aui)iatzy_8ncBiioinzc  >; 

74  - 

.06X6  {  aOLUMI(24)  )| 

75  - 

.06X6  <  aOUD«(24)  >) 

76  • 

.06X6  <  4_aatiBntry_FFTlX3  ); 

77  - 

.06X6  <  C_aub8Dtzy_riMI8B  ); 

76  - 

.06X6  <  (C(B.UMB(34)  *  OUT.OPPSBX) 

)> 

79  - 

.06X6  (  6017(24)  ); 

•0  -  #/ 

61  -  ) 

Fagai  2 


1 


Data:  7/20/92 


FUai  Bsnsur.aai 


1  -  /*  Coda  to  atart  all  VSP  chips  slaultanaoualy.  The  atart  addraaa  of  tha 

2  -  ooda  to  ha  aicaoutad  at  tha  algnal  ahould  ba  tha  flcat  valua  co  tha 

1  -  ataok.  Placad  at  abaoluta  location  0  to  alapllfy  ataxtup. 

4  -  •/ 

5  - 

fi  -  /*  abaoluta  baaa  addraaaaa  froa  aaaoxy  aap  •/ 

7  •  idaflna  PiUN  0x00000 
a  -  Idaflna  POUR_P0SI  0x20000 
9  -  Idaflna  SXKtUS.LkXCB  0x40000 
10  - 

11  -  /*  atatua  bit  valua  to  Indlcata  atart  */ 

12  -  Idaflna  BOia  2 

13  - 

14  -  aapllSO 

15  -  ( 

16  -  /I 

17  -  .DM  0 

la  -  auBRounn  ariuaupo 

19  -  ( 

20  •  /*  xaaat  atatua  blta  */ 

21  -  UR  10  ->  $X; 

22  -  BTR  $x  •>  animia_iJtt<a; 

23  . 

24  -  /•  gat  aaak  for  atart  bit  •/ 

25  -  UR  ISXMfr  •>  $X; 

26  - 

27  -  polln 

28  -  BRDRlISR]  8t»T08_IJa(CB,  $X« 

29  -  UOP  C2»),  II, 

30  - 

31  -  > 

32  -  1/ 

33  -  ) 


Pago, 


1 


Oatat  7/20/92 


Flla:  a:8S«TUS.Aai 


I 


1  -  /*  Skat  pzograa  to  aaka  Zoran  atatua  blta  follow  68020  blta  */ 

2  - 

1  -  />  abaoluta  baaa  addraaaaa  froa  aaanry  aap  •/ 

4  -  fdafloa  nUN  0x00000 

5  -  lOaflna  lOliR.FCRT  0x20000 

6  -  Idaflna  ■XATUS.UTCH  0x40000 

7  - 

8  -  xap325() 

9  -  { 

10  - 

11  -  /# 

12  -  auBROonn  m8ih(  ) 

13  -  { 

14  -  Top:: 

15  -  LOR  amauajutxcB  •>  Si^i 

16  -  BTR  tLC  »  SXkSUa.IATCB) 

17  -  Loopii 

18  -  xoRRiCni]  aiKru8_ut!rcB,  tic  •>  sx; 

19  -  ARim  *3.  tXf 

20  -  OMPC  [ZRl,  lAopi 

21  - 

22  -  ONP  Xop; 

23  -  ) 

24  -  1/ 

25  -  } 


L _ _ _ 

Pagai  1 

i 

I 


Data:  7/20/92 


Filai  B:SnK.AaM 


1  -  /*  to  aynchronlza  VSPa  batwaan  wavaa  of  FPT.  Alao  naada  to  pop  ttaa 

2  -  paraaatara  of  tba  flrat  wava  bafora  raturnlng. 

3  -  •/ 

4  - 

B  -  /*  abaoluta  baaa  addzaaaaa  frca  awiry  aap  •/ 

5  -  fdaflaa  FIUM  0x00000 

7  -  fdaflna  POUR.PORT  0x20000 

8  -  fdaflna  8TAXUB_MICB  0x40000 

9  - 

10  -  /■  Btatus  bit  valua  to  Indlcata  wava  aync  */ 

11  -  fdaflna  NKVB  1 

12  - 

13  -  /*  daflna  nuabar  of  paraaatara  m  auppoaadly  aant  */ 

14  -  fdaflaa  HUM.PAMM  2 

15  - 

16  -  zap325() 

17  -  ( 

IS  -  /f 

19  -  .azscmccBsa 

20  - 

21  -  MnAounn  syrcbkmzbk) 

22  -  { 

23  -  /•  aat  atatna  bit  •/ 

24  -  LOR  $mn  •>  $X; 

25  -  SIR  «X  •>  01ljmn_LKrCB; 

26  - 

27  -  /•  gat  rid  of  aynthatlo  paraaatara  */ 

28  -  8DDR  fHUH.PARAN,  SSP; 

29  > 

30  - 

31  -  /*  wait  for  ayno  raaponaa  */ 

32  -  Folli: 

33  -  8IR»:(TR]  aiATUS.LKKB,  $X; 

34  -  UXIP  (ZR),  fl: 

35  - 

36  -  ) 

37  -  f/ 

38  -  ) 


Pagai 


1 


Data:  7/20/92 


Fllas  BiTBSTl.ASM 


1  -  /*  Vaat  progxaa  to  aaa  If  Borana  work  */ 

2  - 

3  -  /*  abaoluta  baaa  addraaaaa  frca  aanory  nap  */ 

4  -  Idatloa  PRIM  0x00000 

5  -  tdaflna  rouR.PORI  0x20000 

6  -  tdaflna  8S«IU8_U!rCB  0x40000 

7  - 

8  -  xap32S() 

9  -  { 

10  -  Int  1; 

11  -  float  x; 

12  - 

11  -  /*  put  a  vactor  of  (1.0,  x)  at  PRAM  «  0x400  */ 

14  -  /» 

15  -  .QRO  (PRAM  *  0x400) 

16  -  1/ 

17  -  for  (1  •  0,  X  >  0.0;  1  <  16;  x  1.0) 

18  -  ( 

19  -  /I 

20  -  .DATA  {  1.0,  lREB_Ploat(x)  >; 

21  -  #/ 

22  -  ) 

P.l  - 

24  -  /•  put  a  vector  of  (x,  1.0)  at  FOUR_PQRT  •/ 

25  -  /# 

26  -  .ORO  FOUR_PaRT 

27  -  1/ 

28  -  for  (1  •  0,  X  •  0.0;  1  <  16;  !♦♦,  x  1.0) 

29  -  ( 

10  •  /t 

31  •  .DATA  <  IBB_rioat(x),  1.0  ); 

32  -  1/ 

33  -  ) 

34  - 

35  -  /# 

36  -  .ORO 

37  -  BUBROUTIRB  NAIR( ) 

36  -  < 

39  -  /*  write  0  to  atatua  latch  */ 

40  -  IDR  to  ->  $X; 

41  -  STR  $X  ->  STAT0S_LATCS; 

42  - 

43  -  /*  add  two  caaplex  vectora  and  atom  */ 

44  -  10_Cl(16)  (PRAM  «  0x400)  •>  $C0; 

45  -  ADD_Cl(16)  FailR_POtT,  $C0  •>  $C0; 

46  -  aT_Cl(16)  $C0  •>  POUR.PORT; 

47  - 

48  -  /•  aaka  aujra  wa  are  flnlahad,  than  write  Is  to  atatua  latch  */ 

49  -  ayiKi[CU,BU,NU]; 

50  -  LDR  t3  ->  tX; 


Pagai 


2 


I 


Data:  7/20/92 


Flla:  B:TBST2.ASM 


L 


1 

2 

3 

4 

9 

6 

7 

5 

9 

10 

11 

12 

13 

14 

15 

16 
17 
IS 

19 

20 
21 
22 

23 

24 
29 
26 
27 
2B 

29 

30 

31 

32 

33 

34 
39 

36 

37 

38 

39 

40 

41 

42 

43 

44 

49  - 

46  - 

47 
46 

49 

50 


Taat  pxograa  for  Zoran  intarnipta.  Main  routlna  la  an  inflnlta  loop 
that  dacraaanta  $1>C  (atartlng  at  0  and  wrapping  around  In  16  bita). 
Intarrupt  routlna  aata  atatua  and  halta.  Aftar  raatart.  It  claara 
atatua  again  and  raturna  to  Inflnlta  loop. 


•/ 


-  /*  abaoluta  baaa  addraaaaa  froa  aaaory  aap  •/ 

-  Idaflna  PMN  OxOOOOO 

-  *daftna  fOUR.FOm  0x20000 

-  fdaflna  BoavajjaCB  0x40000 

-  I8p32st) 

-  { 

-  /I 

-  UnSKRUPT  SUBBOUTIHB  a8T_HAIT(  ) 

-  ( 

-  /*  wrlta  la  to  atatua  latch  aiul  wait  for  IIBI  */ 

lOR  13  ->  $X; 

-  8TR  $X  •>  8TA!rU8_IJaca| 

-  InfLoopii 

>  MfDRlini]  9IF,  *0x001000; 

-  JMK  [IZR],  InfLoop; 

•  /■  aftar  raauaa,  olaar  atatua  blta  */ 

U>R  *0  ->  $X; 

-  STR  $X  •>  8t»T08_b»*CB; 

-  > 

•  .BXTKRH  _8ubBntry_Sffr_BAI<T; 


-  suBRounn  maih(  ) 

-  { 


/*  aat  Intarrupt  vactor  (happana  to  ba  0,  but  why  not)  •/ 

LOR  S_8ubBntry_5BT_BJU.T  ->  $IF; 

/■  wrlta  0  to  atatua  latch  •/ 
lor  10  •>  $X; 

SIR  $X  ->  aTJIZUS_LKrCB; 

/*  Inflnlta  loop  dacraaantlng  $LC  frcxi  0  •/ 

WVR  SX  ->  »LC; 


-  Loop:  I 


•IMP:  [DL]  Loop; 


-  ) 


1/ 

> 


APPENDIX  C 


PC  INTESPACE  PROGRAMS 


C-1 


D«t«s  7/10/92 
SlMt  16156 


Flltt:  10. C 

lASt  Modified:  Tu«  Jun  30  16:19:12  1992 


Itoatlaasliilt0ap( )  —  Inltlallsa  tlia  dlaplay. 
lapataiPort-  Cnrxwit  port  valna  for  roada  and  wrltaa. 


314  •  atatle 

31|  -  voldlaltSap(anRrort) 

317  -  )*Prlnt  oarrant  baaa  no 
3U  -  goteaqrd,  14); 


ir.»/ 


Tilmt  XO.C 

Last  Modified:  Turn  Jup  30  16t 19; 12  1992 


D«tdt  7/10/92 
81m:  16156 


219 

220 
221 
222 

223 

224 

225 

226 
227 
226 

229 

230 

231 

232 

233 

234 

235 

236 

237 
236 

239 

240 

241 

242 

243 

244 

245 

246 

247 
246 

249 

250 

251 

252 

253 

254 

255 

256 

257 

258 

259 

260 
261 
262 

263 

264 

265 

266 

it: 

269 

270 

271 

272 

273 

274 

275 

276 

277 
276 
279 
260 
261 
262 
363 

264 

265 

266 
267 
266 
269 

290 

291 

292 

293 
394 

295 

296 

297 
296 

299 

300 

301 

302 

303 

304 

305 

306 

307 
306 

309 

310 

311 

312 

313 

314 

in 

317 

316 

319 

320 

321 

322 

323 

324 

325 

326 

327 


olraolf);  _ 

taxtcolor(  LIOHTBLUE  ); 
taxtbackground I  LIGRTGRkY  ); 
cpiita(  *Baa«  Value*  ); 
t^xtcolort  LlGBTSRkY  ): 
taxtbackground (  BIACX  ); 

/'Print  ourrant  port  nuabar.*/ 
gotoxyll,  15); 
clraoli ); 


-  taactoolorC  LIoaTBLUB  ); 

-  taxtbackground (  LIGBTORAY 

-  cputa(  'Port  Valua*  ); 

-  taxtcolort  LIOBTORAY  ): 

-  taxtbackground (  BLkCX  ); 


); 


KoutlnaiCbaekPllaO  -  Chaok  file  typa  and  poaalbly  count  length 

InputaiPllanaaa  Pilanaaa  for  flla  to  ebaek 
OutputaiBlxa-  Pointer  to  Uim  to  atora  length  In  worda 
RatumaiTypa  of  flla,  HKX.PORMAT  or  a.PORMU 


-*/ 


UIMT  'alia  ) 


•rt'); 


-  atatlc 

-  lntCbaokPlla(  char  'Pllan 

-  ^ILl'InPlla; 

-  Into; 

-  onalgnad  long  nlbblaa; 

-  /'open  flla  '/ 

-  InPlla  •  fopan(  Pile 

-  If  (I InPlla) 

-  ratomC  IKVALID  ); 

-  /'find  flrat  non-whlta  eharactar  */ 

-  while  (  laapaoa(c  -  gatc(InFlla) )  ) 

-  ; 

•  /'check  for  a  foxaat  '/ 

-  If  (o  'a') 

-  Icloaat InPlla); 

returni  a.POMMT  ); 


-  ) 


/'ethaiwlaa  hax  foraat,  count  nlbblaa  */ 
nlbblaa  •  0; 
while  (  e  !•  lOF  ) 

-f  (  laxdlglt(e)  ) 

nlbblaa"; 

c  ••  gatc(InPlla); 

/'ooepnta  alxa  la  worda  (rounding  up)  and  ratum  foraat  '/ 
folMScinPlIa);  ^ 

'aiia  •  ^nibble 


return ( 


-  ) 

-  i* 


_  3)  »>  2; 

X.POHmT  ); 


■outlnaiUplcMd( )  —  Tranafar  aaaory  froa  VPB  to  PC  flla 


Inputaidtart-  atart  addraaa 
alxa-  Buabar  of  mrda  to  tranafar  (ahould  be  even) 
Pilanaaa  Maaa  of  daatlnatlon  file 


-•/ 


atatlo 

voldUpload(  onalgnad  long  atart,  UIMT  alxa,  cbar  'Fill 

ilVE  'Outflla; 

UIHT  Count; 

UUR  Val; 

/'opan  output  flla  '/ 

Outflla  ■  Iopan(Pllanaaa,  *wt'); 

If  (  Outflla  »  BUU,  ) 

got«j(l,  23); 

oprTntfMpila  opan  failed  \x07*); 
nKap(3j; 
xatun; 

/'aand  tranafar  rnaaanil  to  VPI  '/ 
qutport(Port,  0x0021); 

/'aaod  alxa  la  longworda  */ 
cnitportCPort,  (alxa  »  1)); 
outportiPort,  (Int)  ataH); 

/'upload  flla  '/ 

for  (Count  -  0;  Count  <  alxa;  Count") 

)'walt  until  road  flfo  not  aapty  '/ 


Pagai 


Flla:  10. C 

Last  Hodlflad:  Tus  Juii  30  16:19:12  1992 


Data:  7/10/92 
aisa:  161S6 


-  wblla  (  I(  ii:port(Baaa  2)  c  ilD  FIFO  EKPTTJ  ] 

-  ;  ~  " 

-  /■gat  word  and  sand  to  flla  •/ 

-  Val  •  InoortCPortl ; 

-  fprlntf(6utfila,  ‘ioix*,  Val); 

-  II  (  (Count  6  Oxf)  ••  Oxf  ) 

-  mte(’\n*,  Outflla); 


lloutlna:eatWord( )  —•  raads  hax  words  froa  Input  flla 
pads  and  dlffaranily  than  acanf  for  odd  I  of  bj^as 

Inputs I  Inflls*  Input  flla  polntar 

Ratumailntagar  valua  raad 


•  atatlo 

-  UIMCastMord(  FIU  •Inflla  ) 

-  autouimval; 

-  autolntc; 

-  statlccbarBuf IS]  •  *0000*; 

•  autolntnlbbla; 

-  /*r«ad  4  bax  digits  */ 

-  for  (nlbbla  •  0;  nlbbla  <  4;  nlbbla»«) 

-  ^■Ignoro  aabaddad  spaaao  and  nawllnaa  */ 

-  Milo  (  lsspaca(c  •>  gatc(lnflla) )  ) 

-  /*stora  bax  digits,  taro  pad  at  moff  */ 

-  If  (loxdlglt(eT) 

-  ittf(olbbla]  •  e; 

-  alas  If  (o  -•  lOF) 

•  iuf (nlbbla)  •  ‘O'; 


-  alas 

:  fcsaffli 

•  ontlntf { *UnaKpaetad  obaraotar  to  la  flla  \s07*,  a); 
:  pSiUial  .  ‘O'; 

-  /•eonvart  to  Intagar  •/ 

-  aaoaaf/Buf,  •tx^BVal); 

-  rotuzn(Val); 


Boutlnai8andBlock( )  -  Transfar  block  froa  PC  flla  to  VF8  ■ 

Inputs:  Inflla-  Input  flla  polntar 
Btart-  Start  addrosa 

Bits-  Buabar  of  words  to  transfar,  will  pad  to  avon 


-  atatlo 

396  -  voldSandBlook)  FILB  •Inflla,  unalgnod  long  Btart,  VIIIT  Slxa  ) 

-  UIHTVal; 

I  -  UIBMount; 

-  /■aand  tranafar  ooMand  to  VPB  •/ 

-  outport(Port,  0x0020); 

-  /■sand  alta  In  longwords,  round  up  ■/ 

-  outport(Port,  (Slxa  *  l)  »  l); 

-  outportIPort,  lint)  Btart); 


-  /■handla  axpsotad  nuabsr  of  worda  ■/ 

-  for  (Count  ■  0;  count  <  Bias;  Count**) 

-  /■raad  a  word  froa  flla  ■/ 

-  Val  -  OatMorddnflla); 

-  /■wait  until  writs  flfo  not  fall  ■/ 

-  wblla  (  i(  lnport(Basa  *  2)  6  l•K_FIP0_PUU.)  ) 

-  /•writs  word  ■/ 

>22  -  Mtport(Port,  Val); 

-  /•it  Duabar  of  words  la  odd,  pad  with  0  */ 

-  If  (Bits  61) 

-  ^■walt  until  writs  flfo  not  full  ■/ 

-  ^la  (  l(  lBport(Baaa  *  2)  6  MB_P1P0_FULL)  ) 

-  /*wrlta  pad  word  ■/ 

•  aatport(fort,  0x0000); 


rlXai  zo.c 

Last  Modiflad:  Tu*  Jua  30  l6:X9il2  1992 


Datai  7/10/92 
SlMi  16156 


rilat  10. C 

Last  Modlfiad;  Tus  Jon  30  16:19:12  1992 


MMt  7/10/92 
aim  16156 


rila:  10. C 

Last  Modlflad:  Tua  Jon  30  16:19il2  1992 


Datss  7/10/92 
alMi  15505 


Laat  ModlClaO:  Tua  Jun  30  16 


mat  Bi\IO\IO.QLO 
in  30  ie>19:34  1992 


-  *  NodulaiTast  Modula  -  Tast  1/0  boarda, 

.  a 

-  *  AutboriJohn  Stavana 


*  Copyrlght:Copyrlght  1908-1992  by  Spaca  Tach  Corporation, 

*Ft  Colllna,  Colorado,  USA.  All  Ughta  Rasarvad. 

*  BtatuatTbla  prograa  la  tba  aola  proparty  of  Spaca  Tach  Corporation 
*and  la  oovarao  iindar  non-dlaolqaura  agraaaanta.  Thla  prograa 

•la  PROPiUBTARy  7  COmriDBIITIAL  7  THASE  SBCRET,  and  dlscloaiira 

■of  tha  contanta  of  this  docuaant  ahall  conatltuta  violation 

jof^aijnad^ajraaaonta^and^wlll^ranult^ln^aavara^ganaltlaa.^^^^^^^^^^^ 


ff 

ir»!t 

typadafunslgnadlntUIlIT; 


/■Plla  format  flaga*/ 

idaflna  INVALID  0 
daflna  BEX  pormat  l 
daflna  8_FaMIAT  2 

/■Maximum  fllanaaa  langtb;  auat  ba  laaa  than  255  ■/ 
•daflna  IIAMS_SIZB  50 

atatloUIMTBaaa  •  0x340) 
atatloUlHTPort; 


RoutlnaiShowatatC )  —  ahow  tha  valua  of  tha  atatua  raglatar. 
Inputa:Baaa-  Sana  I/O  addraaa  of  board  to  chack. 


-  atatlo 

-  voldah«watat(lntBaaa) 

-  raglatar Inti; 

-  taglatarintj; 

-  autolntblt; 

•  statloohar^atatRagn  • 

-  ^Parity  Error  -  ■, 

-  ‘Wrlta  PlPO  Empty  -  •, 

-  ■Vrlta  PlPO  Pull  -  ■: 

-  ‘Wrlta  PlPO  Alwat  Empty  -  ■; 

-  »wrlta  Piro  Almoat  Pull^  -  •; 

-  ‘Naad  PlPO  Empty  -  ■, 

-  ‘Naad  PlPO  Pull  - 

-  ‘Naad  PlPO  Almoat  Empty  -  *, 

-  ’Naad  PlPO  Almoat  Pulr  -  • 

-  )I 

-  /’Print  status  haadar.’/ 

-  ?wSoior(  HioBTBLUE  >  s 

-  taxtbactoroundt  LiaBTOmy  ); 

-  cprlntf(*  l7o  Board  Btatus  \r\n’); 

-  taxtcolort  LICHTCRAT  1; 

-  taxtbaokground(  BLACK  ); 

-  /’Raad  tha  atatua  port,  display  valuas.’/ 

-  j  «  lnport(Basa  ♦  2); 

-  /’Print  valuas.’/ 

-  for  (1  •  8,  bit  •  RD_PIPO_ALM0BT_PULL;  1  >-  0;  1—) 

-  /’Print  status  valuas.’/ 

-  taatoolorC  LIOBTORAY  ); 

-  taxtbackgroundi  BLACK  )} 

-  cprlntf(*PalBsVW)> 

-  alma 


taxteoloct  BLACK  _ 

taxtbackgroundr  LieBTORAr  ); 
cprlntf(*Trua  \r\n’); 

bit  »•  1) 

/’Raaat  to  normal  colors.’/ 
taxtcolor(  LiaBTORAY  )i 
taxtbaokgronnd(  BLACK  ); 


Pagai 


1 


rilai  B:\I0\10.0LD 
Last  HodlClad:  Tua  Jon  30  16:19:34  1992 


/•Show  valua  of  atatus  blta  in  atatua  rMla^.*/ 
cprlntf(‘Statua  Bit  Valuaa  -  0x%x\r\a*,  j  6  0x3); 


Iloutlna:aatBaxHo( ) - Oat  a  taaxadaclaal  nuabar. 

Bstuxnillataraa  an  Intagar  nuaAiar. 


119  - 

120  -  Btatlo 

121  -  unalBoad  longOatBagdlodDtllBlt) 

122  -  { 

123  -  raglatarlntl; 

124  -  raglataxinto; 

125  -  anto  nnalonad  lonaxat; 

126  -  autoobarBf[9]; 

127  - 

120  -  /‘Oat  ebaraotara  froa  tha  kayboard.*/ 
129  -  for  (1  •  0;  ;  ) 

131  -  /*aat  ban  ebaraotara.*/ 

132  -  do 

133  -  ( 

134  -  c  -  gatchO; 

135  -  c  •  tolowari  c  ); 

136  -  If  (t  (c  >-  'a'  66  c  <•  'f')  66 

137  -  I  (c  >-  '0'  66  o  <-  '9')  66 

138  -cl-  '\b*  66  c  I-  '\t') 

139  -  putcbf  '\x07'  ); 

140  -  >  wblla  (I  fc  >-  'a'  66  c  <-  'f')  66 

141  -  I  (C  >-  *0'  66  C  <-  '9')  66 

142  -  o  1-  '\b'  66  e  I-  '\r')j 
1*3  -  , 

144  -  /‘Braak  out  on  earrlaga  ratum.*/ 

145  -  if  |c  ~  'Xr') 

146  -  break; 

1*1  -  , 

148  -  /•Cbaraotar.*/ 

149  -  BWltoh  (  o  ) 

150  -  { 

151  -  oaaa  '\b't 
15|  -  It  (1  6  0) 

153  -  i 

154  -  af(~i]  -  '\0'I 
15|  -  ^te('*\b  \b*  ); 

157  -  braaki 

158  -  tefaulti  • 

159  -  If  (I  <  Halt) 

160  -  ( 

161  -  Bfll**]  -  (char)  o; 

162  -  BfUj  i  '\6'; 

163  -  pateb(  e  ); 

164  -  ) 

165  -  break; 


-  /*Batum  nuabar.*/ 

-  88canf(Bf,  •%lx*,  6rat); 

-  ratorni  tat  ); 


Boutina :PrtVal( )  —  Print  an  Intagar  valua  In  haxadaclaal,  daclaal 
and  binary. 

laputaiVal-  Valua  to  print. 


-  atatlc 

182  -  voldPrtVaKUIBTVal) 

-  raglatarlntl; 


-  raglatarUIBTblt; 

-  nutooharatr |20) ; 

"  atr[19]  -  '\0'i 

:  irM :  ^ 

:  • '  *' 

-  if  (Val  6  bit) 

-  Strll)  -  '1'; 

-  alaa 

-  StrlU  •  'O'; 

-  bit  «•  1; 

;  > 

201  -  oprlntf(*0x%04x  %5u  Ba*.  Val,  Val,  Str); 


BontlnatlnltSapC )  —  Inltlallsa  tba  dlaplay. 
InputaiPort-  Currant  port  value  for  raada  and  wrltaa. 


209  - 

210  -  atatlc 

211  -  voldlaltOap(UnZPort) 

213  -  'Print  onrrant  baaa  nuabar.*/ 

214  -  goto3cy(l,  14); 


:  gSSaflt; 

-  taxtoolorl  LIOBIBLUB  ); 

-  taoRbaekgroundC  LIOBTOmy  ); 

-  eputa(  *iaaa  Valua*  ); 


0«tat  7/10/92 
8is«:  15505 


Filat  B:\XO\IO.OLO 
L««t  Modlfiad:  Tua  Jtixi  30  16:19:34  1992 


Dat«<  7/10/92 
aim  1S50S 


Last  NodKladi  Tua  Jun  30  16 


Fllai  Bt\IO\IO.OU> 
in  30  16ii9<34  1992 


-  fprlntf(Outflla,  •%04x",  Valjf 

-  li  (  (Count  4  Oxt)  •••  Oxf  ) 

-  ^utc('\n',  outflla); 

-  puto('\n',  Outflla); 

-  foloaa(Ouillla); 


RoutlnatOatWordO  —  raada  hax  wocda  (ton  Input  tlla 
pada  and  dlfZarantly  tban  acanf  for  odd  I  of  bytaa 


Inputai  Inflla-  Input  flla  polntar 
Ratuma  I  Intagar  valua  raad 


-  atatlc 

-  UimaatWord(  FIU  *lnflla  ) 

-  autoUIRVal; 

-  autolntc; 

-  atatlocbarBuf [5]  ■  *0000*; 

-  autolntnlbbla; 

-  /‘raad  4  hax  dlglta  */ 

-  for  (nlbbla  •  0;  nlbbla  <  4)  nlbbla**) 

-  ^*1000X0  aabaddad  apacaa  and  nawllnaa  •/ 

•  whlla  (  lBapaoa(o  •  a*hc(Inflla) )  ) 

-  /*atora  hax  dlotta<  aaro  pad  at  Bor  */ 

-  If  (laxdlalt(eT) 

-  iuf[nlbbla]  •  C} 

-  alaa  If  (c  —  BOF) 

-  iuf [nlbbla]  ■  'O't 

-  alaa 

-  0otaxY(l,  23); 

-  clraoll); 

-  cprlntf ( ‘Unaxpactad  charactar  %c  In  flla  \x07*,  c); 

-  alaap(2j; 

-  Buflnlbbia]  -  'O'; 


-  /‘eonvart  to  Intagar  •/ 

-  aacanf(Buf,  ■%x*,  BVal); 

-  ratam(Val); 


Boutina I BandBloek ( )  —  Tranafar  block  froa  PC  flla  to  VPB  aaaory 

Inputai  Inflla*  Input  flla  polntar 
Start-  Start  addraaa 
Slaa-  Buabar  of  worda  to  tranafar 


-  atatlc 

-  voldSaodBlaok(  PllB  •Inflla,  unalgnad  long  Start,  Olin  Slaa  ) 

-  UIBTVal; 

-  /■aand  tranafar  ccaaand  to  VPB  •/ 

-  outport(Port,  0x0001); 

-  outportfport,  Slaa); 

-  outportfPort,  (Int)  Start*; 

-  ",  (Inti  (St^  - 

,  0x0000); 


-  outportiPort 

-  outportiPort 


»  16)); 


-  /•handla  axpactad  ouabar  of  worda  •/ 

-  wblla  (Slaa^ 

-  }*raad  a  word  froa  flla  ■/ 

-  Val  •  OatMorddnflla); 

-  /‘wait  until  wrlta  flfo  not  full  */ 

-  wblla  I  l(  lnport(Baaa  •  2)  4  HB  PIPO  PUI,!.)  ) 

-  ;  ~  ~ 

-  /*wrlta  word  •/ 

-  outportiPort,  Val); 


BoutlnaiDownload( )  —  Tranafar  PC  flla  to  VPB 

Inputai  Pllanaaa-  Baaa  of  aouroa  flla 
Poraat-  Input  flla  foraat 
Stiirt-  Start  addraaa 
Slaa-  Buabar  of  worda  to  tranafar 


-  atatlc 

-  void  Download!  obar  •Pllanaaa 

-  PILB^ Inflla; 

-  IntOooa; 

-  Into; 

-  /■opan  Input  flla  */ 

-  Inflla  -  topan(Pllanaaa,  *rt* 


t,  Int  /oraat,  unalgnad  long  Start,  UIBT  Slaa) 


Pagai 


Data:  7/10/92 
Size:  15505 


Flla;  B 

Last  Hodlflad:  Tua  Jun  30  16 


-  If  (  Inflla  NUU.  ) 

-  gotcacjfl,  23); 

:  eprlStW^FUa  opan  fallad  \x07*); 

-  alaapOl; 

•  raturn; 

•  /*cbaok  fozaat  and  handla  aaoh  */ 

-  If  (  rozwt  ••  iax_?oiuui7  ) 

-  iandBlookC  Inf 11a,  atart,  alza  ); 

-  alaa 

-  ^*S  format  -  bandla  aacb  racord  */ 

-  Dona  •  0; 

-  wtaila  ( lOona) 

-  ^‘flnd  naxt  'S'  and  gat  racord  typa  */ 

-  wblla  (  (c  -  gatc(Inllla) )  !•  'S'  s&  c  !•  BOF) 

-  c  •  gatc(Inflla); 

-  /'handle  aacb  typa  */ 

-  if  (c  —  '3') 

-  ^'gat  racord  length  and  coaputa  data  worda  */ 

-  facanff Inflla,  ’t2x*,  caiza); 

-  Siza  -  (Slza  »  1)  -  2; 

-  /*gat  atart  addraaa  */ 

-  facanf (Inflla,  *%aix’,  SStart); 

-  /'tranafar  data  flald  •/ 

-  8aodBlock(  Inflla,  Start,  slza  ); 

-  alaa  If  (e  »»  '7') 

-  iaam  -  1; 

-  alaa  If  (c  ••  BOF) 

:  tesjfli 

-  cprlnttl  "Klasing  •nd-of-'flla  racord  \x07”); 


-  clraolMi 

-  cprlntf(*i 

-  dlaapo}; 

-  Dona  ■  i; 


-  alaa 

-  clraSli' 

-  eprlntf {^Unaxpaotad  racord  typa  %c  In  flla  \x07*,  c); 

-  alaap(3); 

-  Dona  •  1; 


feloaaC Inflla); 


Boutina : main ( )  Entry  point  and  main  routlna  for  program. 


-  voldnaln(  Intargc, 

-  cbar*«argv) 

-  raglataruimval; 

-  raglatarlntc; 

•  auto  uniignad  longStart; 

-  autoUIimIza; 

-  autochar  Buffar(IUMB  SIZE'!]  >  {  NAME  SIZE  ); 

-  autochar'Fllanaaa;  ~  ~ 

-  autolntFormat; 

-  /'Oat  coawnd  llna  paramatars.'/ 

-  If  (arge  »  1) 

-  ^'Oat  I/O  board  base  addrasa.'/ 

-  It  (aacanf (argv[l] ,  'bx*,  CBaaa)  !•  1) 

-  Iprlntf (atdarr,  'Error,  paramatar  ona  muat  ba  a  * 

-  'haxadaclaal  I/O  apaca  addraaa. \n' ) ; 

-  Mlt(  2  ); 

-  Port  •  Baaa; 

-  /'Sat  op  to  raad  coBand.'/ 


-  /'Sat  op  to  raad  coBand.'i 

-  clracrl); 

:  SS^l^cl^da:  (B,aaa, 

-  'Tojuit*); 


(F)ort,  (R)aad,  (H)rlta,  (U)pload,  (D)ownload 


-  /'Print  currant  port  nuabar.'/ 
535  -  InltOapC  Port  );^ 

-  /'loltlallza  ttaa  aoraan.'/ 

-  for  (  ;  ;  ) 

-  /'Show  tba  atatua.'/ 

-  Sho«iStat(  Baaa  ); 

-  /'Oat  command.'/ 

-  ootoxy(l,  21); 


Pagai 
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635 

636 

637 

638 

639 

640 

641 
643 
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645 

646 

647 

648 

649 

650 

651 

652 

653 

654 


cnrlntfCComanO?  • 

Ir  ( (c  •  9«tch»( ) ) 
brMk; 


'Q'  II 


'q') 


/■Sxacuta  cgua^d.;/. 

■wltob  (  toloi«r(  c  )  ) 

5!S*t  baa*  addraaa  nuabax.*/ 

851^1)  L”*' 

oprlntit*! 


^Batar  Baaa  Port  Addraaa 


/*aat  baJc  nuag*? 
••••  •  oatBaxMoC 


.*/ 

3  ); 


/•Print  currant  port  nuabar.  / 
gotoxyrit  14); 
clraoif I; 


Frtvai(  Baaa'); 


braak 


P't 


.  nuabar.*/ 

23); 

'Bntar  Port  Buabax  >  *); 


cprlati 

/•Oat  taax  nuabar. •/ 
pon  -  MtBiSbl  3  ); 

/•Print  currant  port  nuabar.*/ 
gotaxy<l<  15); 


lP  ): 


l^![{); 

..JO(^COlOC,  — T7- 
fxtbackQ  round  ( 

tairtbAckgroundC  Blu^CK  }, 

P^Vai(  Pori*); 

■  braak{ 

■  9*5aad%alua  froB  p^.*/ 

.  val  •  lnport(  port  ); 

^  /'Print  currant  port  nuabar.*/ 

•  gotoxyjl.  17); 

:  sstSr’*a5S8“‘‘i.“22^ 

:  HsJgJsiSfSE&'l, 

;  L’li 
:  SSf 

-  /•o«t  port  nu»B«r*»/ 

•  gotoa^iX/  23); 

-  iprlutW 


M  ); 


•); 


-  /*0at  bar  nuabar.*/ 

-  val  -  OatBaJOio(  4); 

»  outport(Port,  val); 

-  /‘Print  currant  port  nuabar 

-  aotoxy(l.  16); 

-  clraoi}): 


.*/ 


-  taxtcaior(  7‘75?™*;SL,yLi.v 

-  taxtbackgrounc;  LIOHTOlUMf 

-  CDUta(  ^rlta  value*  ); 

-  taxtcolor(  tlOBTORkY  h 

-  taxtbaokgrorad(  BUCK  )< 

:  l?Sv:i<*vii  )l* 


); 


-  braak; 

«  CAM  *U'  I 

a.  /•o«t  mtmxtmddrmmmrn* / 

-  mtolcy(l>  23); 

-  e^SntU‘«ntar  »tart  kddraaa 


'); 


/•Oat  hax  nmbar.*/ 
Start  ••  aatBaxllo(  8 


); 


Z  /*oat  nlaa  In  words.*/ 

«  gotoxy(lp  23); 

;  Sp*5SUHlnt«  «*  ww**  »  *>' 

*  /*08t  bax  nua^.y. 

.  Sl*a  -  oatMxBoC  4  ); 

Z  /*aat  fllanaaa.*/ 

>  gotaxytl,  23); 

-  cprlnttK«ntar  O^tlnatlOT  Pll 

-  plXaaaaa  "  egatn(Butfar); 

-  /•Parfora  upload  operation  y,_ 

-  Upload (  Start,  Blta,  Pllanaaa); 

'  braak; 


>  •); 


D«t«t 

Sls«: 
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674 

675 

676 

677 

678 
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-  'd': 

-  /•0«t  flltfiMsa**/ 

-  gotoa^U/  23); 

-  cprlntt|*Bnt«r  Sourc*  PilaaaM  >  ■); 

-  Fii«n4M  ■  cg*ts(Burf6r); 

-  /«chM!k  tormat.  coiut^^da  If  h«t 

-  Voxaat  >  ChMkrila(  rllanua,  sslxa  ); 

-  /»if  hax  foraat,  raquaat  atart  addxaaa  •/ 

-  If  (roraat  -•  HBX.FOmAT) 

-  ^*0at  atart  addraaa.*/ 

-  gotoxvlf,  23); 

-  cpflntf(*Entar  start  Addraaa  >  •); 

-  Start  •  eatBaxllo(  S  ); 

;  ) 

-  /•Farfora  download  oparatlon  •/ 

-  Download(  Pllanaaa,  Poraat,  Start,  Size  ); 

-  break; 

:{ 

-  olracrO; 


Laat  Wodlfl^;_Wa 


Pile:  Bi\10\ 
Jun  30  16:19:3 


lo.ou; 

4  1992 

»■■■■* 


Data:  7/10/92 
aim  3940 


Flla:  lOB.DOC 

Laat  Modlfladi  Tua  Jun  30  16:18:30  1992 


PC  I/O  Board  Drlvar 


-  Inatallatlon: 

-  To  Inatall  tha  drlvar,  an  Inatallabla  davlca  drlvar  aotry  auat  ba  plaead 

-  In  tba  oonflg.aya  flla.  Tba  antry  looka  Ilka: 

-  Davloa^:\lob.blD  340  a  1 

-  wbara  'Davlca*'  tails  MS-DOB  that  what  follows  la  tha  flla  naaa  of  an 

-  Inatallabla  davlca  drlvar,  'c:\lob.bin'  la  tha  disk,  dlractory  and  flla  naaa 

-  of  tha  davlca  drlvar  flla,  '340'  la  tha  basa  addrass  of  tha  I/O  ports  usad  by 

-  tha  PC  I/O  board,  'a'  la  tha  Intarrupt  niiabar  uaad  by  tha  board  (Intarrupts 

-  ara  not  currantly  laplaaantad)  and  '1'  la  tha  davlca  unit. 

-  At  boot  tlaa,  tha  davlca  drlvar  will  display  a  hsadu  containing  tha  nM 

-  of  tha  drlvar,  soaa  Inforaatlon  about  tha  davlca  drlvar  configuration  (aost 

-  of  It  takan  straight  off  tha  davlca  drlvar  coaaand  llna),  and  than  print 

-  a  proapt  and  wait  for  tha  uoar  to  praaa  any  kay. 

-  Shown  balow  la  tha  axact  conflg.ays  flla  that  *1*  luad  to  Inatall  tha  I/O 

-  board  drlvars  whan  I  was  tasting  thaa. 

-  fllas*40 

-  buffars-10 

-  braak'on 

-  lantdrlva*z 

29  -  Davloa>c:\hlaaa.ays 

-  Davloa*o:\aBa386.ays  2000 

-  ilavloa*c:\wlndown\SBartdrv.sya  2048  1024 

-  davlca-c:\wlndawn\raadrlva.syn  1024  /a 

-  pavlca*c:\dnadrlva\atatlon.ayn  unlta*4 
oa*o : \dnadr 1 va\ spool . ays 


-  sball*a 


ro*o :  \dnadr  1  vo\  spool .  ays 
Vlob.bln  346a  1 
_ \lob.bin  360  b  2 


Inlt.axa  -R  o: 


-  This  la  a  RIAL  MB-OOS  drlvar,  which  aaana  that  you  can  use  It  just  Ilka 

•  any  othar  charaotar  davlca.  For  axaapla,  to  sand  a  flla  to  tha  VPH,  typa 

•  c»py  vphflla.dat  lobl 

-  and  NB-DOa  will  sand  tha  flla  (If  It  Is  an  avan  nuabar  of  bytaa)  to  tha 

-  VPH. 


-  It  Is  Important  to  n 


ir  that  tha  davlca  drlvar  trya  to  alalc  a 


-  charactar  davlca  drlvar,  but  tha  PC  I/O  Intarfaca  board  la  a  word  davlca. 

-  This  aeans  that  If  you  sand  q  •  w  «  r  (whara  w  la  2,  r  la  xaro  or  ona) 

-  bytaa  to  tha  davlca,  only  q  *  w  bytaa  will  ba  racalvad  at  ths  othar  and. 

-  Tha  davlca  drlvar  will  raport  that  It  aant  all  q  •  w  -»  r  bytas,  but  tha 

-  laat  byta  will  ba  waiting  In  a  but far  In  tha  davlca  drlvar,  and  will  not 
"  actually  ba  sent  until  at  laaat  ona  aora  byta  Is  wrlttan  to  tba  davlca 

-  drlvar  to  aaka  up  a  full  word. 

-  Sloca  tha  drlvar  Is  a  raal  NS>D0S  davlca  drlvar,  it  can  ba  accassad 

S9  •  juat  Ilka  any  othar  flla  or  davlca  froa  any  prograaaing  languaga  that 
'*  -  supports  flla  I/O.  r  r  -»  -w 

•  A  list  Of  tba  davlca  drlvar  functions  that  this  davlca  supports  Is: 

-  0-Inltlalisatlon.  This  function  Is  RHVXR  accassad  by  tba  uaar. 

-  4-Raad.  Raad  data  froa  tha  davlca. 

•  6-Input  Btatus.  Dataralna  If  thara  Is  any  data  to  raad. 

-  7-Input  Flush.  Throws  away  any  data  In  tha  Input  buffar. 

-  8-Writa.  Hrlta  data  to  tha  davlca. 

-  lO-Output  Btatua.  Dataralnas  whathar  tha  output  buffar  Is  aapty. 

-  16-Output  Until  Busy.  Output  until  davlca  output  buffars  ara  full. 

-  This  Is  synonyaous  with  tha  Wrlta  function  for  this  davlM. 

-  19-Oanarlo  10  Control.  Band  coaaanda  to  tha  davlca.  Tba  davlca 

-  currantly  only  supports  ona  coaaand;  raaat. 

-  Dabngglng: 

-  Tha  PC  I/O  Board  driver  Is  wrlttan  In  pura  assaably  languaga,  and  Is 

-  HOT  dabugabla  by  any  of  BTC's  Inhousa  aoftwara  dabuggsrs.  Tnara  ars  two 

-  ways  to  track  down  bugs;  coda  Inspactlon  and  docuaantatlon  ravlaw,  and 

-  chackpolnt  duaps.  Tba  first  la  tha  racoaandad  way,  tba  second  Is  useful 

-  whan  the  prograaer  bacoaas  to  lazy  or  frustrated  to  use  tha  first. 

-  A  check  point  dump  cnnalsts  of  allocating  a  big  enough  buffer  In 

-  aaaory  to  atora  tha  relevant  Inforaatlon,  and  Insartlng  code  Into  tha 


Data)  7/10/92 
Slsas  3427 


rila:  lOB.IHC 
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-  lOCTLM 

-  (tiling 

-  SlUaU 


la  true 

ILangth  ot  raaord  la  bytaa 
ot  usad. 


glPunctloadb?;  Punctloa  auabar. 
gi>tatusd«7;  ftataras  tha  status. 
gmasatvadldM  dup  (7);  Spars  sp 


103  •  glCatagorydb7i  Catago: 

104  -  glMlaorCooMb?;  Mlmr 


ipara  spaca. 
it  dsvica  dxlvar. 


omaaarvsd2dd7 1  Spars  spaoS; 


106  -  gllOCTLSatadd7;  Polatar  to  loCTL  data  atruetura. 

107  >  lOCIUnguUTBDda 


Pagai 


ril*:  XOi0BO.AaM 
LMt  Modiflads  Tu«  Jua  30  16:18:32  1992 


RMd  data  from  davica.  Thla  routloa  baa  two  antry  points 


•  zoctuiaadt 

-  JUMdl 

•  ;Srror  chaok  for  poaalbla  aaro  byta  raada* 

»  aovcx^  aai [dl] .IxvrBytao 

m  oapeXf  0 

•  InsHonZaroRaad 

-  ]apMtun>0K 

-  lOat  polaur  to  road  bultor,  care  out  bytaa  raad  ooantar 

•  >oiiZarallaadt 

•  laadl,  aa>(dlj  .IxwrBudar 

•  xoral,  al 

-  ;If  tbora  la  a  aparo  byta  to  raad/  gat  It. 

:  5352iS«'  ® 

-  ;Claar  flag. 

-  aovlulFlag/  0 

•  :aat  byta  and  atora  In  raad  butfar. 

-  aoval,  RdBftr 

-  ■OVMJtOl).  dl 

•  inedl 

•  iacai 

•  daoax 

-  gMta^na  If  tbara  ara  any  coaplata  worda  to  raad. 

-  Bovba.  os 

-  atarcS/  1 

-  jossMadayta 

-  ;llaad  tba  propar  nuabar  of  uorda. 

-  aovds,  ICSaaa 

•  PsadMordai 

•  }Xf  tba  toad  FIFO  la  oapty,  raad  la  ooaplata. 

•  adddiCp  2 

-  loax,  ds 

-  taatas,  OtOb 

-  jtbdPona 

-  ;toad  word. 

-  aubdx,  3 

-  Inas,  ds 

-  ;Sava  word. 

•  Bovosi(dl],  as 

-  adddl,  3 

•  addal,  3 

-  ;Cback  to  aaa  If  aaough  words  bava  boon  raad. 

-  aubbs.  2 

-  loopMadWorda 

-  ;If  tbara  la  a  byta  laft  to  raad,  do  ao. 

-  toadByta: 

-  taatbs,  1 

-  jsMDono 

-  ;Cbock  to  aaa  If  toad  Fife  la  aapty. 

-  aovds,  IQtoaa 

-  addds,  2 

-  Inas,  ds 

-  taatas,  MOh 


;toad  tba  word,  aava  bytaa. 

aubds,  3 

Inas,  ds 

tovaaildl],  al 

Inoal 

Inedi 

aovMBffr,  ah 
aovRdFlag,  1 

iCaloolata  and  aava  tba  oaa 
WdDooat 
laadl,  togPkt 
aovaaildli.lrwrlytaa,  at 


of  bytaa 


Tilmt  lOaOBQ 
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Data]  T/10/92 
aisa:  19S80 


-  popda 

-  popaa 

-  popex 


-  japBataziiOK 


-  ;Ioptti 

-  ;iiot. 


itatatoa - Dataialna  wtaattiar  thara  ara  any  cbaraccaia  to  caad  or 


-  Inpatstatua: 

•  ;Cliack  to  aaa  If  tPara  la  a  aavad  raad  byta. 

-  CBpMFlag,  0 

-  jnaRfPotlipty 

-  ;aat  tba  atatua  of  tba  raad  PXVO. 

-  Bovdx,  lOSaaa 

-  adddx,  2 

-  Inax,  dx 

-  taatax,  090b 

-  Inxaraotlapty 

-  raad  FIFO  la  aapty,  ratum  buay. 

-  BOSX 

-  xoral,  al~ 


■ovaa:fdl].rhStataa,  ax 
^■pBadlntarrupt 


-  ]Tbo  raad  FIFO  la  not  aapty,  ratum  that  thara  la  a  charactar  In  It. 

-  RFMotBapty: 

-  Bovab,  hB  OK 

-  xoral,  al~ 

-  aovaai Idl] .rhfltatua,  ax 

-  japBndlntarrupt 

-  IlnputFluab  - —  Input  fluah. 

-  Inpnttluahi 

-  jSat  flag  to  no  aavad  raad  charactar. 

-  aovBdFlag,  O 

-  ;llaad  worda  froa  tha  raad  FIFO  until  than  ara  no  aora. 

-  aovdx,  lOBaaa 

•  XnFluahLpi 

-  adddx.  2 

•  Inax,  dx 

•  taatax,  oaoh 

-  IxInFluahOoaa 

-  aubdx,  2 

-  Inax,  dx 

•  japloFluahlip 

-  iDona  with  fluah. 

•  InFluahDonat 


InFluahDonat 

japaatumOK 


-  /OutputBtatua  —  Dataralna  whathar  all  obaraotara  bava  baan  raad  or  not. 


-  Outputftataai 

-  to  aaa  If  thara  la  a  aavad  wrlta  byta. 

-  capHrtFlag,  0 

-  JnaWFBotBapty 

-  /Oat  tha  atatua  of  tba  wrlta  FIFO. 

-  aovdx,  lOBaaa 

•  adddx,  2 

-  Inax,  dx 

•  taatax,  08h 

-  jnxMFBotBapty 

-  ;J|>a  ra^  FIFO  la  aapty,  ratum  buay. 

-  amaE^^ra  BUBI 

-  xoral,  al~ 

-  aovaa;fdl].rbatatua,  ax 

-  JapBndlntarrupt 


£SwS«.n**?  ratum  that  thara  la  a  charactar  la  It. 

aovabTn^W 

xoral,  al 

^aattdll.rhatatna,  ax 
japBadIntarrupt 


{opanDvc/CloaaPve  - —  RaM  tto  davloa. 

I - - - - - - - 

OpaoDvct 

CloaaOvoi 

japBatanOR 

/BatabllM  tha  baaa  addraaa  of  tba  board, 
aovdx,  lOBaaa 
adddx,  2 

;Raaat  tha  board, 
aovax,  cnjnm 
ootdx,  ax 


riias  laaoM.AaM 

Lut  Nodlflad:  Tua  Jim  30  16i].Bs32  1992 


•  I Mow  aat  noraal  oparatlng  valuaa. 

-  Bovax,  CK  BASE  VAlUE 

-  outOx,  ax"  " 

-  ;Claar  raad  and  wlta  buffar  flaga. 

-  BovndFlag,  0 

-  aovWrtriag,  0 


iDona. 

japRatuniOK 


;Ud1bp1 


rhStatua,  ax 
ntaxmpt 

-  ;rf  ami  ocanlatad  suecaafully. 

-  XaturnOK: 

-  laadl,  caiRaqPkt 

-  Bovab.  KB  OK 

-  xoxal,  al" 

-  Bovaai [dl] .rhdtatua,  ax 

477  -  BndXntarzupti 
'■  -  popM 

-  popda 

-  pOMl 

-  popdx 

-  popn 

-  popbx 

•  jlnltlallxa  tba  drlvar  by  aavlng  tha  addraaa  of  tba  xaquaat  baadar 

-  jpaekat. 

-  Btratagyi 

•  Bovword  ptr  osiRaqPkt,  bx 

-  Bovwoxd  ptr  caiKaqPkt  >  2,  aa 


•  ;0uap  %rhat  la  balow  tbla  point 

497  •  ^••nmmmmmmmmmmmmmmmmmmmmmmmmmm 

49S  -  BndOrlvari 


301  -  iPrlntAry  Print  tba  propar  nuBbar  of  cbaractara  to  tba  acraan 


obaraotara. 


It  obaraotara. 

rSiicnkj 

leopPrtArylip 

;Katttm  froa  print  loop, 
popdx 


andpPrlntAry 


iPrlntMag  —  Print  an  arror  Baaaaga  to  tba  acraan. 
;dx-  contalna  tba  offaat  of  tba  arror  Baaaaga  to  print 


Pagas 


D«t«t  7/X0/92 
81s«:  19580 


764 

765 

766 

767 

768 

769 

770 

771 

772 

773 

774 

775 

776 

777 

778 

779 

780 

781 

782 

783 

784 

785 

786 

787 

788 

789 

790 

791 

792 

793 

794 

795 

796 

797 

798 

799 

800 
801 
802 

803 

804 

805 

806 

807 

808 

809 

810 
811 
812 

813 

814 

815 

816 

817 

818 
829 
820 
821 
822 

823 

824 

825 

826 

827 

828 

829 

830 

831 

832 

833 

834 
635 

836 

837 
638 

839 

840 

841 

842 

843 

844 

845 

846 

847 

848 

849 

850 
651 
852 
653 
854 
655 
856 
657 

858 

859 

860 
861 
862 

863 

864 
665 
866 

867 

868 

869 

870 

871 


incbx 

loopLnBndflaaroh 


;Xf  M  got  wo  could  not  find  ond  of  lino. 

■ovdx,  offoot  BodCadLino 

oollPrlntMog 

jB^ndXnit 


-  sFound  tbo  ond  of  tno  cirind  lino  poroaotorof  ootoblioh  count. 


fndBhdLihoi 
■ovoXf  bx 
wjvbix,  word  per  ostCadLln. 
■id)Ox,  bic 


6kl^rbglla _ 

KivSl,  u:llax] 


juntfro^Na 


laiul^rogliaM 
Incbx 

loopSklpProgHa 


;If  %fa  got  bar.,  wa  ara  at  and  of  eniaund  llna. 

Bovdx,  offaat  SyntaxBrr 

oallPrlntMag 

j^B^Inlt 


;Sklp  whlta  apaoa  and  gat  tha  flrat  para 

tbaaa  addraaa. 

ndProgMaaa: 
oallSkipWhlta 
callOatBax 
■ovoailOBaaa,  ax 


atar,  wbloh  la  tha  I/O 


iCtaack  for  lagallty  of  lOBaaa  addraaa. 

eapax,  3ffh 

jluaaaOK 

aovdx,  offaat  BadlOBaaa 

oallPrlntMag 

japBadlnlt 


^Oat^acond  paraaatar,  which  la  tha  hardwara  Intarrupt  vactor  nuabar 


callSklpWhlta 
callOatBax 
■ovoatintvacllo,  ax 


;la  thla  Intarrupt  vactor  nuabar  107 

;control  raglatar  valua. 

capax.  10 

jnaChkIntll 

aovoaiCRlntaot  0 

japOatDvoMo 


If  ao,  convart  to  tha  propar 


;Ia  thla  Intarrupt  vactor  nuabar  117 
;eontrol  raglatar  valua. 

Chklntlli 
capax,  11 
jna^Intl2 
aovcarCRIntMo, 
japOatOvcMo 


If  ao,  convart  to  tha  propar 


lOOOh 


;Ia  thla  Intarrupt  vactor  nuabar  127 

i  control  raglatar  valua. 
hklntl2: 


If  BO,  convart  to  tbo  proper 


capax,  12 
jnaChkIntlS 


BOvca:CRIntao,  2000h 
JapOatOvcMo 


I la  thla  IntarruDt  vactor  nuabar  127 

-  -  ■  th  a . 


lexlt  driver  with 


_  ^  .  If  not,  print  an  error  aaaaago 

bad  Inltlallxatlon. 

hklntlSi 
capax,  IS 
JalalntlS 

Bovdx,  offaat  ladlntMo 

callPrlntMag 

japBadlnlt 


;Thla  la  Intarrupt  vactor  nuabar  15,  convert  to  tha  propar  control 

;raglatar  valua. 

iBlntlS: 

aovoaiCRlntBo,  lOOOh 


>0at  tha  device  nuabar. 
OatBvcMo: 
caliaklpHhlta 
oalioatiax 


{Modify  tha  driver  naaa  In  tha  boador  atructura. 
laabx,  DvcBdr.dhHaaoOrUnlta 
addal,  '0' 
aovlbx  *  3],  al 


{Print  driver  title, 
aovdx,  offaat  BdiMag 
callPrlntMag 


{Print  tha  daacrlptlon  for  tha  driver  naaa. 

aovdx,  offaat  OrvrBaaa 

callPrlntMag 


{Print  tha  driver 

puahaa 

aovex,  S 
aovbx,  ca 


Data:  7/10/92 
Slzai  19560 


Plla:  lOBDBO.ASM 
Last  Modlflad:  Tua  Jua  30  16sie>32  1992 


Pagai 


Pagmi 


-  ;amt  writ*  count  and  arror  cback  lor  poaslblo  loro  longtli  wrltaa. 

-  Bovcx,  as: (dl] .irwrBytaa 

-  capex,  0 

-  jnaMonZaroWrt 

-  japRaturnOK 

6  -  ;Oat  pointar  to  wrlta  buffar,  aava  wrlta  buffar  alsa  and  xaro  out  byta 

7  -  iwrlta  count. 

8  -  nonZaroWrt: 

9  -  laadl,  aa: (dl] .IxwrBuffar 
0  -  xorsi,  al 

2  -  ;Is  tbara  a  spars  byta  to  writs? 

3  -  capWrtPlag,  0^ 

204  -  JaMrlts 

-  ;Cbsak  to  aaka  aura  that  tbs  Wrlta  riVO  la  aot  full. 

-  Bovdx,  lOBaaa 

-  adddx,  2 

-  Inax,  ds 

-  tastax,  BR  WRT  FULL 
211  -  jxwrtOaoa  ~ 

-  ;Baro  out  flag. 

214  -  aovWrtriag,  0 

-  iBulld  word  to  writs. 

-  anval,  WrtBffr 

-  aovab,  asiCdl) 


Data:  7/10/92 
Slza:  18790 


Flla:  KaORVR.MM 
Laat  Modltlad;  Tua  Jui:  30  16:18:34  1992 


Pa«ai 


Wilmt  ZGBDRVS.AflN 
Last  Modlfladi  Tua  Jus  30  16:18:34  1993 


Batat  7/10/92 
Blsai  18790 


rila:  XOBOIIVR.ABM 
Laat  Nodlflad:  Taa  Jiw  30  16ilBi34  1992 


546 

547 
546 

549 

550 

551 

552 

553 

554 

555 

556 

557 

558 

559 

560 

561 

562 

563 

564 

565 

566 

567 

568 

569 

570 

571 

572 

573 

574 

575 

576 

577 

578 

579 

580 

581 

582 

583 

584 

585 

586 

587 

588 

589 

590 

591 

592 

593 

594 

595 

596 

597 

598 

599 

600 
601 
602 

603 

604 

605 

606 

607 

608 

609 

610 
611 
612 

613 

614 

615 


-  '0' 

-  32oSf**  *** 

-  ;Prlnt  valu*  to  atrloo. 

-  BOVCX,  4 

-  FrtBaicLpi 

-  ;0at  currant  baoi  obaractar. 

-  aovdl,  al 

ofh 


-  ;CiiMk  to  aaa  If  daolaal  (0-9)  or  hlghar  (a-f). 

-  ^pdlf  Oan 

-  jlfaOaolaal 

~  (•-<)•  oonvact  to  oharaotar  to  dlaplay. 

-  aubdl,  Oab 

-  adddl,  'a' 

-  JapBxtChar 

-  ;Ia  dMlaal,  oonvart  to  oharaotar  to  dlaplay. 

-  laoaclaal: 

-  adddl,  '0' 

-  :Stora  oharaotar. 

-  HxtChati 

-  daSS**  “ 

-  /Shift  naxt  oharaotar  Into  lovaat  nlbhla. 

-  abrax,  1 

-  ahrax,  1 

-  ahrax,  1 
•  ahrax,  1 

-  loopPrtSaxLp 

-  /Craata  polntar  to  atrlno. 

-  Boyte,  olfaat  Oapstr 

-  oallPrlntMag 

-  /Saatora  raglatara  and  ratum. 

:sF 

-  andptrtBax 


-  /OatBax  —  Oat  a  haxadoclaal  nuabar  froa  a  atrlng. 

-  ***  ***•  “*  **“  baxadacmal  nuabar. 

-  procOatBax 

-  ;8at  up  a  loop  countar. 

-  puabcx 

-  xorax,  ax 

-  aovex,  4 

-  Oatoigltat 

-  ;Taat  for  In  0-9. 

-  aovdl,  aai(bx] 

-  oapdl,  '0' 

-  JlfotOacBrr 

-  c^l,  '9' 

-  JofotOaoirr 


632 

633  - 

634  - 

635  - 

636  - 

637  - 

638  - 

639  - 

640  - 

641  - 

642  - 

643  - 

644  - 

645  - 

646  - 

647  - 

648  - 

649  - 

650  - 

651  - 

652  - 

653  - 

654  - 


XTaat  for  In  A-P. 

BotLoCaaai 

oubdi"^A'°***  Saxadaolanl,  oonvart. 
adddl!  Oab 


{Add  anothar 
AddBaxOlglti 
ahlax,  1 
ahlax,  1 
ahlax,  1 
ahlax,  1 
oval,  dl 


around  again. 
loopOatOlglta 


box  digit. 


Pagat 


6*4  -  ilnlttalisa  tin  drlvar  and  gat  paraaatara  (xoa  tba  Cratlg.sys 
683  -  »on—8tiil  llna. 


-  (Print  drlvar  Into. 

-  Inltlallxa: 

-  {Oat  a  pointar  to  tha  conflg.aya  co 

-  lastac,  aa:  Cdl]  .IrParaaMdraaa 

-  Bovword  ptr  CaUlna,  toe 

-  aovword  ptr  CadLlna  *  2,  aa 

-  (Saareh  far  and  of  llna. 

-  aovex,  300h 

-  LoEndaaarchi 

-  Bovdl,  byta  ptr  aatfbx] 

-  ca^l,  Mh 

-  laPndlndLlna 

-  aBpd:U  Oab 

-  JaPndKndLtna 
•  Xootax 


lit  va  gat  hara,  wa  could  not  find  and  at  llna. 

Bovdx,  offaat  BadCadllna 

oallPrlntNag 

japBadlnlt 


(Pound  tha  and  at  tha  aonaand  llna  paraaatara.  aatabllah  count. 

PndlndLlnai 

BBvea.  hx 

aovbs.  word  ptr  oaiCadLlna 
auboa.  bn 

i55|frS5S2Sr 

rsi-J 

1 alndproghaao 
oapdl,  o9h 

IaIndProgMaaa 
nebx 

loopBklpProgaaaa 

(If  wa  got  hara.  wa  ara  at  and  of  ocMand  line. 

aovdx.  offaat  ayntaxBrr 

oallPrlntMag 

Japladlnlt 

(Skip  whlta  apaoa  and  gat  tba  flrat  paraaatar.  which  la  tba  I/O 

(baaa  addraaa. 

SndProgSaaai 
oallSklnMhlta 
oallOatBax 
aovcailOSaaa.  ax 

(Cback  for  lagallty  of  lOBaaa  addraaa. 

oapax,  3ffb 

llaBaaaOK 

aovdx.  offaat  ladlOSaaa 

oallPrlntHag 

japSadlnit 

^Oat^jjaGOOd  paraaatar.  which  la  tha  bardwara  Intarmpt  vactor  nuabar 

eallSklnWhlta 
callOatBax 
aavoailntVaaSo,  ax 

(la  this  tatampt  vactor  nuabar  lOT  If  no.  eonvart  to  tba  ptopar 

(control  raglatar  valua. 

capn,  10 

iabCtaklntll 

aovcaiciantao.  0 

JapOatOvoBo 

(la  thla  intarmpt  vactor  nnaba 
(control  mglatar  value. 

Cbkintlli 
oapax.  11 
jnaCtuantia 


ir  117  If  ao.  eonvart  to  tba  propoc 


•  BoveatCRIntllOt  lOOOn 

•  jBpMtOvcMo 

-  jl«  thla  Intarrupt  vactor  nuabar  12?  If  ao.  eoavart  to  tba  propar 

-  loontrol  raglatar  valua. 

-  ehklntl2> 

-  eapax,  12 

-  inaCuIntlS 

-  BovoaidUntlto,  2000h 

-  japOatOvoRo 

-  >la  thla  Intarrupt  vaotor  nualMu;  12?  It  not,  print  an  arror  aaaaaga  and 

-  laxlt  drlvar  with  a  bad  Inltlallaatlon. 

-  ChklntlS: 

-  capaa,  IS 

-  jalalatlS 

-  aovdx,  offaat  BadlntMo 

-  callPrlntHag 

-  ja^adlnlt 

-  ;Thla  la  Intarrupt  vactor  nuabar  15,  convart  to  tba  propar  control 

-  iraglatar  valoa. 

-  iaiStlS: 

-  aovaasdUntHo,  3000h 

-  lOat  tba  davlca  nuabar. 

-  MtPvcMo: 

-  oalMklpinilta 

-  oallOatBax 


tNodlfy  tba  drlvar  naan  In  tba  baadar  atrudtura. 
iaabx,  OvoBdr.dbHaaaOrUnlta 
adda^  '0* 
aov(b8  *  3],  al 

;Prlnt  drlvar  tltla. 
aovdx,  offaat  adxMag 
oallPrlntMag 

tPrlnt  tba  daacrlptlon  for  tba  drlvar  naaa. 

aovdx,  offaat  DrvrMaaa 

callPrlntNag 

iPrlnt  tba  drlvar  naaa. 

puabaa 

aovex,  t 
aovbx,  oa 

laabx^  OvcBdr.dhbaaaOrUnlta 
callPrlntAry 


iPrlnt  carriage  ratum,  naw  llna. 
aovdx,  offaat  Crlf 
callPrlntNag 

;Bcbo  tba  loBaaa  addraaa  value. 

aovdx,  oftaat  Btatl 

callPrlntNag 

aovax,  lOBaaa 

oallPrtBax 

aovdx,  offaat  Crtf 

callPrlntNag 


)Bobo  the  Intarrupt  vector  valua. 

aovdx,  offaat  btaU 

callPrlntNag 

aovax,  IntVaoRo 

oallPrtbax 

aovdx,  offaat  Crl>f 

callPrlntNag 

I Print  tba  abaoluta  addraaa  of  tbla  drlvar. 

aovdx,  offaat  bbabddr 

callPrlntNag 

aovax,  ca 

callPrtBax 

aovdl, 

aovab,  03b 

lnt21h 

Iaabx,  caiDvcBdr 
aovax,  bx 
oallPrtBax 
aovdx,  offaat  CrLf 
callPrlntNag 

llnltlallxa  tba  board.  Begin  by  raaatlng  all  mglatara. 

aovdx,  cailOBaaa 

adddx,  2 

aovax,  CB  IIBSir 

outdx,  ax~ 

)Bow  aat  control  raglatar  wltb  Interrupt  nuabar  and  Intarrupta  tumad 
;off . 

aovax,  CR_BMn  VUUB 
cutdx,  ax 

iBuecaatul  Initialisation,  aat  propar  valaaa  lb  packet,  satan. 
laadl,  eaiBaqPkt 


Begin  by  raaatlng  all  mglatara. 


laadl,  eaiBaqPkt 
aovaai tdl] .IrStatua,  lOOh 

Bovworo  ptr  aat [dll .IrBndAddraaa,  offaat  BndDrlvar 
aovword  ptr  aas [dl] .IrBndAddraaa  *  2,  ca 

callPauaa 
japBadIntarmpt 

^Brmr^la  laltlallsatlcn,  aat  propar  vain 
laadl,  oaiBaqPkt 


in  paokat,  return. 


Flla:  lOBORVR.ASM 
Laat  Modlflad:  Ti»a  Jun  30  16:18:34  1992 


Data:  7/10/92 
81ia:  18790 


873 

874 

875 

876 

877 

878 

879 

880 
881 
882 
883 


aovaa: [dll .IrStatua,  8100h 

aovwora  ptr  as: (dll .IrEndAddraas,  offsat  EndDrlvar 
Bovword  ptr  as: [dl] -IrEndAddiass  *  2,  cs 
■ovas: [di] .IrMassagaFlag,  1 

oallPausa 

ja|«iidlntarrupt 

.taataiida 

and 


DMXmi  7/10/92 
81s«t  25Se 


Fil«:  PROTOCOL. Tier 

Last  Modlflad:  Thu  Jul  09  l7:13Ue  1992 


Vat  Anotbar  PC  I/O  Board  Drlvar  Daslgn  Oocuaant 


■ofwara  atructura  of  MS-DOS  PC  davica  drlvar: 
Wrlta  Packat 


-  Bat  tba  «lta  alia  to  ba  tba  alia  of  tha  wrlta  buffar. 

-  Claar  tba  wrlta  alia  valua  In  tba  raquaat  baadar. 

-  Hblla  tba  wrlta  alia  la  azuMtmi  than  xaro 

-  do 

-  If  tba  wrlta  alia  la  graatar  than  or  aqual  to  BIBB  bytaa 

-  than 

-  Packat  alia  la  BIBB  bytaa. 

-  alaa 

-  Packat  alia  la  wrlta  alia. 

-  andlf 

-  Dlvlda  tba  packat  alia  In  bytaa  by  two  to  gat  alia  In  worda. 

-  Wrlta  tba  word  alia  of  tba  packat  to  tha  wrlta  FIFO. 

-  Inltlallza  tba  chack  aua  with  tha  packat  alia. 

-  Wrlta  tha  packat  to  tha  Wrlta  FIFO  and  aua  tba  worda  to 

-  calculate  tha  chackaua. 

-  Wrlta  tba  cbackaua  to  tba  wrlta  FIFO. 

-  Bat  BXBXO. 

m  1,000 

-  Baad  atatua  raglatar. 

-  whlla  SThTl  bit  la  claar. 

-  Claar  BXBXO. 

m  IiOOP 

-  Baad  atatua  raglatar 

-  wblla  SXBTl  la  aat. 

-  Subtract  twlca  tba  packat  alia  froa  tha  wrlta  alia. 

-  dona 

-  Baad  Packat 

-  If  tbara  la  a  partial  packat  atlll  waiting  In  tba  raad  FIFO 

-  than 

-  Sat  tha  raad  alia  aqual  to  tha  raaalnlng  packat  alia. 

-  alaa  If  BXBXO  la  claar 

-  than 

-  Bat  tba  buay  bit  In  tha  raquaat  baadar  and  ratum. 

-  alaa 

-  Raad  tha  packat  alia  froa  tha  raad  FIFO. 

-  Sava  tha  packat  alia  In  tha  packat  alia  buffar. 

-  Sava  tha  packat  alia  aa  tba  initial  chackaua  valua. 

-  Sat  tha  raad  alia  aqual  to  tha  packat  alia. 

-  andlf 

-  Dlvlda  tha  raad  buffar  alia  by  two  to  gat  tha  word  count. 

-  If  tba  raad  alia  >  raad  buffar  alia 
•  thM 

-  Bat  tha  raad  alia  to  tba  raad  buffar  alia. 

-  audit 

-  Raad  worda  froa  tba  raad  FIFO,  auMlng  tbaa  for  tha  chackaua. 

-  If  tha  Duabar  of  worda  raad  la  laaa  than  tha  packat  alia 

-  than 

-  Bat  tba  raad  alia. 

-  Ratum. 

-  alaa  If  tba  antlra  packet  haa  been  raad 

-  than 

-  Raad  tba  obaokauo  word  froo  tba  raad  FIFO  and  add  to  tba  ebackaua. 

-  B^  axBXl  In  tba  oontrol  raglatar. 

-  RaaS  atatua  raglatar. 

-  wblla  BXBXO  la  aat. 

-  Claar  SXBXI. 

-  If  tha  cbackaua  la  lUHi-iaro 

-  than 

-  Ratum  a  check  aua  error. 

-  alaa 

-  Sat  tha  packat  alia  buffar  to  zaro. 

-  Return  aucoaaa. 

-  andlf 

-  andlf 


APPENDIX  D 


MICROINSTRUCTION  FORMAT 


D-1 


ADDRESS  GENERATO 


ADDRESS  REGISTER  FILE 


MICROSEQUENCE 


PORT  A 
WRITE  SOURCE 

PORT  A 
WRITE  ADDRESS 

PORT  B 
WRITE  SOURCE  j 

PORT  £ 
WRITE  ADDRESS 

PORT  A  PORT  A  PORT  £  PORT  B 
VRtTE  SOURCC  VRHE  ADDRESS  WRITE  StUiCE  VRHE  ADDRESS 


ADDRESS  PORTS  ADDRESS  RAM  §2  COMPARATOR  ™cou'ntcr 


TWO  DlMi 
COUNl 


PROCESSOR  BC 


MULTIPLIER  #1 


MULTIPLIER  #2 


RATOR  BOARD 


MICROSEQUENCER 

BRANCH  ADDRESS  INS 


MEMORY 

CONTROL 


IMMEDIATE 
DATA  FIELDS 


INSTRUCTION 
T  r  1 — I — r-i — 


7  ADDRESS  RAM  #1 


ADDRESS  data 


-  DIMENSIONAL 
>  DUNTER  #4 


TWO  DIMENSIONAL 
COUNTER  §3 


TWO  DIMENSIONAL 
COUNTER  #2 


TWO  DIMENSIONAL 
COUNTER  #1 


UNDEFINED 


BSOR  BOARD 


!  ALU  #1 


ALU  #2 


f 


I 

I 


APPENDIX  E 

CPE  DEFINITION  FILE 


I 


E-1 


I 

I 


Date:  6/30/92 
Size:  53964 


File:  CPB.FIX 
Last  Modified:  Frl  Feb  21  16:26:02  1992 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 
17 


/*  prelimihary  micro  asm  oefihition  for  CPH  */ 

>  768 

■  1 
-  0 


MICROPROGRAM  SEQUENCER  (FIELD  SEQ) 


WIDTH 

PHASES 

DEFBXT 

/* 

*/ 


SEQ[24] 

DEFAiLt  -  0X00007F 

/*  BRANCH  ADDRESS  */ 
BRA[16] 

^  DEFAULT  -  0X0000 


18 

LABEL 

19 

- 

} 

20 

• 

21 

• 

/» 

INSTRUCTION 

*/ 

22 

- 

INStS] 

23 

— 

{ 

24 

• 

DEFAULT 

0X7P 

/* 

COKTINUE  */ 

2b 

CONT 

0X7P 

/* 

COHTIHUE  •/ 

26 

LDLC 

0X7E 

/* 

LOAD  LOOP  COUXTER  */ 

27 

LDSP 

0X7D 

/* 

LOAD  STACK  POIIITEK  */ 

28 

• 

LDSRP 

0X7C 

/* 

LOAD  8UBR0UTIME  BAM  FOINTBB  »/ 

29 

• 

LDSUBR 

0X7E 

/» 

LOAD  SUBROUTIIIE  BAM  •/ 

10 

SIM 

0X7A 

/• 

SET  IHTEBBUPT  MASK  BITS  •/ 

11 

RIM 

0X79 

/• 

BESET  IIITEBRUFT  MASK  BITS  */ 

12 

HINT 

0X78 

/* 

BESETS  INTEBBUPTS  */ 

33 

JI 

0X77 

/• 

JUMP  IMMEDIATE  •/ 

14 

• 

JIC 

0X76 

/* 

CONDITIONAL  JUMP  IMMEDIATE  */ 

35 

- 

JR 

0X75 

/* 

JUMP  BELATIVE  •/ 

36 

JRC 

0X74 

/* 

CONDITIONAL  JUMP  BELATIVE  */ 

37 

.. 

LI 

0X73 

/• 

LOOP  IMMEDIATE  */ 

38 

• 

LR 

0X72 

/• 

LOOP  BELATIVE  */ 

39 

LS 

0X71 

/* 

LOOP  TOP  OP  STACK  •/ 

40 

• 

TWBI 

0X70 

/• 

IMMEDIATE  TBBEE  UAX  BRANCH  */ 

41 

- 

TWBR 

0X6F 

/* 

BELATIVE  TBBEE  ttAY  BBANCB  */ 

42 

• 

CALL 

0X6S 

/* 

CALL  SUBBOUTINE  */ 

43 

CALLC 

0X6D 

/• 

CONDITIONAL  CALL  SUBROUTINE  */ 

44 

• 

RET 

0X6C 

/• 

BETUNN  FROM  SUBROUTINE  •/ 

45 

- 

RSTC 

0X6B 

/• 

CONDITIONAL  RETURN  FBQH  SUBROUTINE  •/ 

46 

PUSH 

0X6A 

/• 

PUSH  STACK  •/ 

47 

PUSHC 

0X69 

CONDITIONAL  PUSH  STACK  •/ 

48 

• 

PLDLC 

m 

0X68 

/* 

PUSH  STACK  AND  LOAD  COUNTER  •/ 

49 

• 

PLDLCC 

0X67 

/• 

PUSH  STACK  AND  COMDITIONALy  LOAD  COUNTER  */ 

so 

• 

POP 

0X66 

/* 

POP  STACK  •/ 

51 

POPC 

0X65 

/* 

CONDITIONAL  POP  STACK  «/ 

52 

£I 

0X64 

ENABLE  ALL  UNMASKED  INTERRUPTS  •/ 

S3 

• 

DZ 

0X63 

/• 

DISABLE  ALL  INTERRUPTS  •/ 

54 

) 

55 

• 

} 

5fi 

57 

• 

/* 

mm 

■  ■■a 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 


CONDITION  CODE  SELECT  (FIELD  CCS) 


-  «/ 

CCS(8] 
-  DSFAlkT 


•  OBllllllll 
/•  SELECT  •/ 
SBL(7] 

DEFAULT 


OBlllllll 


/*  FLAOS  FOR  MULTIPLIER  tl  •/ 

• 

MIINT 

-  OBOOIOOOO 

/* 

INTERUPT  */ 

• 

MIPS 

•  OBOOlOOOl 

/• 

PARITY  ERROR  */ 

- 

MIR 

«  060010010 

/• 

NEOATIVE  •/ 

- 

MIZR 

-  OBOOlOOll 

/• 

ZERO  •/ 

- 

MIOV 

-  OBOOlOlOO 

/* 

OVERFLOW  */ 

- 

MIUF 

-  OBOOIOIOI 

/* 

UNDERFLOW  •/ 

- 

MIINX 

•  OBOOlOllO 

/* 

INEXACT  •/ 

- 

MIXNV 

-  OBOOlOlll 

/• 

INVALID  OFERATIOI  */ 

- 

MllfAR 

•  OBOOllOOO 

/• 

NOT  A  NUMBER  •/ 

• 

MIRND 

«  OBOOllOOl 

/• 

ROUND  UP  */ 

- 

MIDEN 

«  OBOOllOlO 

/* 

DENORMALIZED  •/ 

~ 

MIOIVZ 

«  OBOOllOll 

/* 

DIVIDE  BY  ZERO  */ 

/*  FLAGS 


85 

. 

M2ZRT 

•  OBOOlllOO 

/• 

86 

- 

M2PE 

-  OBOOlllOl 

/• 

87 

M2R 

-  OBOOllllO 

/• 

88 

- 

M2ZR 

-  OBOOlllll 

/* 

69 

- 

M20V 

•  OBOIOOOOO 

/• 

90 

- 

M2UF 

-  OBOIOOOOI 

/• 

91 

• 

M2INX 

-  OBOlOOOlO 

/» 

92 

- 

M2XI(V 

-  OBOlOOOll 

/» 

93 

- 

M2NAN 

>  OBOlOOlOO 

/• 

94 

- 

M2RND 

-  OBOIOOIOI 

/• 

95 

• 

M2DBN 

-  OBOlOOllO 

/» 

96 

* 

M20IVZ 

-  OBOlOOlll 

/• 

97 

96 

• 

/*  FLAOS  FOR  ALU  11 

*/ 

99 

- 

Aiiin 

•  OBOIOIOOO 

/• 

100 

•> 

AIPE 

»  OBOlOlOOl 

/» 

101 

- 

AIN 

-  OBOIOIOIO 

/• 

102 

- 

AIZR 

-  OBOlOlOll 

/• 

103 

•• 

AlOV 

-  OBOlOllOO 

/• 

104 

- 

AlUF 

•  OBOlOllOl 

/* 

105 

- 

AlIHX 

>  OBOlOlllO 

/• 

106 

- 

AlINV 

•  OBOlOllll 

/• 

107 

- 

AINAH 

-  oaoiioooo 

/* 

108 

- 

AliWD 

•  OBOllOOOl 

/* 

109 

- 

AIDER 

-  OBOllOOlO 

/• 

R  #2  */ 

IlfTBRUPT  */ 
PARITY  ERROR  */ 


A  NUMBER  •/ 


DIVIDE  BY  ZERO  */ 


INTERUPT  •/ 

PARITY  ERROR  •/ 
NEGATIVE  •/ 

ZERO  •/ 

OVERFLOW  •/ 

UNDERFLOW  <»/ 

INEXACT  •/ 

INVALID  OPERATION  */ 
ROT  A  NUMEI»  */ 

ROUND  UP  V 
DEN^^NALIZSD  •/ 
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Date: 
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437  - 

438  • 

439  > 

440  > 

441  - 

442  - 

443  - 

444  - 

445  - 

446  - 

447  - 

448  • 

449  > 

450  - 

451  - 

452  - 

453  - 

454  - 

455  - 

456  • 

457  - 

458  - 

459  - 

460  ' 

461  - 

462  - 

463  - 

464  - 

465  - 

466  - 

467  - 

468 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 


RDB 

ROAB 


OBIO  /«  READ  BAHK  B  */ 

OBOO  /«  READ  BARKS  A  AND  B  */ 


/* 


*/ 


AUXILIARY  MEMORY  CONTROL  (FIELD  AUX) 


AUX[4] 

defaiLt 


OBll  /*  NO  OPERATIOR  */ 

OBOl  /*  WRITE  REAL  */ 

OBIO  /*  WRITE  IMAGINARY  •/ 

OBOO  i*  WRITE  REAL  AND  IMAGINARY  */ 


-  OBllll 

/*  AUXILIARY  WRITE  INSTRUCTIONS 
WRITE(2] 

^  DEFAULT 
WRR 
WRI 
WRRI 

} 

/»  AUXILIARY  READ  INSTRUCTIONS  */ 

RBAD[2] 

^  DEFAULT  -  OBll  /*  NO  OPERATION  */ 

RDR  -  OBOl  /*  READ  REAL  •/  . 

RDI  «  OBIO  /«  READ  IMAGINARY  */ 

-  OBOO  /«  READ  REAL  AND  IMAGINARY  */ 

) 


-  /* 


-  *i 


MULTIPLIER  ONE  (FIELD  Ml)  -  -AND-  -  MULTIPLIER  TWO  (FIELD  M^^ 


M2t26] 

DEFAULT  -  OBOOOOOOOlOOOOOOlOlOllOOOlOO 

/*  CBOSSBAK  REGISTER  SOURCE  SELECT 


480 

< 

0X0 

481 

• 

DEFAULT 

482 

HOLD 

0X0 

463 

Ml 

OXA 

484 

M2 

0X9 

485 

Al 

0X1 

486 

. 

A2 

0X2 

487 

• 

AR 

0X3 

468 

AI 

0X4 

489 

BR 

0X5 

490 

• 

Bl 

0X6 

491 

• 

AUXR 

0X7 

492 

• 

AUXI 

0X8 

493 

• 

XOR 

OXB 

494 

• 

lOI 

OXC 

495 

REGA 

OXD 

496 

• 

REGB 

OXE 

497 

• 

DIS 

OXF 

498 

•> 

> 

499 

500 

- 

/• 

FORT  X  CONTROL 

•/ 

501 

502 

503 

504 

505 

506 

507 

soe 

509 

510 

511 

512 

513 

514 

515 

516 

517 

518 

519 

520 

521 

522 

523 

524 

525 

526 

527 

528 

529 

530 

531 

532 

533 

534 

535 

536 

537 

538 

539 

540 

541 

542 

543 

544 

545 


/*  SELECT  ALU  #1  */ 

/*  SELECT  ALU  12  •/ 

/•  SELECT  CACHE  PORT  A  REAL  */ 

/*  SELECT  CACHE  PORI  A  IMAOIHMY  */ 
/•  SELECT  CACHE  PORT  B  BEAL  •/ 

/*  SELECT  CACHE  PORT  B  IHAQIHABY  »/ 
/*  SELECT  AUXILIARY  PORT  HEW  •/ 


/•  SELECT  REGISTER  FILE  PORT  A  */ 
/*  SELECT  REGISTER  FILE  PORT  B  */ 


XCTRL[4] 


DEFAULT 

HOLD 

POMS 

PIMS 

POLS 

PILS 

TPORT 


> 

/*  CROSSBAR 

YSEL(4] 

^  DEFAULT 
KILO 
Ml 
M2 
A1 
A3 
AR 
AI 
BR 
BI 

AUXR 

AUXI 

lOR 

lOI 

RSGA 

REOB 

DI8 

> 


/•  PORT  Y  CONTROL  •/ 
YCTHL[31 

^  DEFAULT 
HOLD 
POMS 
P1M8 
POLS 
PUS 


OBOOOl  /•  HOLD  */ 
OBOOOl  /•  BOLD  •/ 
OBOOll  /•  PHASE  0 
OBlOll  /*  PHASE  1 
OBOOOO  /•  PHASE  0 
OBIOOO  /•  PHASE  1 
OBOlOO  /•  T  PORT  ' 


MOST  SIGNIFICANT  */ 
MOST  SIGNIFICANT  */, 
LEAST  SI0IFICAMT  */ 
LEAST  SIGNIFICANT  */ 


REGISTER  SOURCE  SELECT  •/ 


0X0 

0X0 

OXA 

0X9 

0X1 

0X2 

0X3 

0X4 

0X5 

0X6 

0X7 

0X8 

OXB 

OXC 

OXD 

OXB 

OXF 


BOLD  (REGISTER  SELECTS  IT'S  SELF)  •/ 
HOLD  (REGISTER  SELECTS  IT'S  SELF)  •/ 
SELECT  MULTIPLIER  tl 
SELECT  MULTIPLlra  12  •/ 

SELECT  ALU  il  *7 


A  REAL  •/ 

A  IMAGIRMY 
B  REAL  •! 

IMAGINARY  •/ 


•/ 


SELECT  ALU  f2  •/ 

SELECT  CACHE  PORI 
SELECT  CACHE  PORI 
SELECT  CACHE  PORI 

SELECT  CACHE  PORT _ 

SELECT  AUXILIARY  PORT  REAL  •/_ 
SELECT  AUXILIARY  PORI  ^OINARY  */ 
SELECT  I/O  PORI  REAL  */ 

SELECT  I/O  PORT  IMAGINARY  •/  , 
SELECT  REGISTER  FILS  PORT  A  •/ 
SELECT  REGISTER  FILE  PORT  B  */  , 
DISABLE  FORT  (REGISTER  BOLDS)  */ 


-  OBOOl  /•  BOLD  •/ 

-  OBOOl  /*  BOLD  •/  _  , 

•  OBOll  /*  PBASB  0  MOST  BIGNIFICAHT  */ 

-  OBlll  /•  PBASB  1  MOST  SKWIFICAOT  •/ 

•  OBOOO  /*  PBASB  0  LEAST  SIGBIFICANT  */ 

•  OBIOO  /•  PHASE  1  LEAST  SIGNIFICABT  */ 


> 

f*  INSIRUCnONS  FOR  MULTIPLIERS  */ 


Paqaa 


5 


Dat»: 

Size: 


6/30/92 

53964 


Flla: 

Last  Modified:  Frl  Feb  21  16:26: 


546 

547 

548 

549 

550 

551 

552 

553 

554 

555 

556 

557 

558 

559 

560 

561 

562 

563 

564 

565 

566 

567 

568 

569 

570 

571 

572 

573 

574 

575 

576 

577 

578 

579 

580 

581 

582 

583 

584 

585 

586 

587 

588 

589 

590 

591 

592 

593 

594 

595 

596 

597 

598 

599 

600 
601 
602 

603 

604 

605 

606 

607 

608 

609 

610 
611 
612 

613 

614 

615 

616 

617 

618 

619 

620 
621 
622 

623 

624 

625 

626 

627 

628 

629 

630 

631 

632 

633 

634 

635 

636 

637 
636 

639 

640 

641 

642 

643 

644 

645 

646 

647 

648 

649 

650 

651 

652 

653 

654 


-  /* 


-  •/ 


1NS[8] 

^  DEFAULT 
NOP 


OBOlOllOOO 

OBOlOllOOO 


/•  NO  OPERATION  *f 
/•  NO  OPERATION  •/ 


/*  FLOATING  POINT  ARITHMETIC  INSTRUCTIONS  •/ 
/*  FLOATING  POINT  DIVISION  •/ 


DIV 

DDIV 


.  OBOOOOOOOO 
«  OBOOOOOOOl 


/*  X/Y  •/ 
/•  DP:  X/Y  •/ 


/*  FLOATING  POINT  SQUARE  ROOT  */ 


SQRTX 

DSQRIX 


.  OBOOOOOOlO 
•  OBOOOOOOll 


/ 

/*  DP 


SQUARE  ROOT  X  «/ 
:  SQUARE  ROOT  X  •/ 


/*  FLOATING  POINT  MULTIPLICATION  WITH  WRAPPED  OPERANDS  */ 


MULTWX 

DMULTWX 

MULTWY 

DHULTWY 


«  OBOOOOOlOO 
«  OBOOOOOIOI 
«  OBOOOOOllO 
-  OBOOOOOlll 


/*  WRAPPED  X*Y  */ 
/•  DP:  %nUVPPED  X*Y  */ 
/*  X*WRAPPED  Y  */ 
/•  DP:  X*WRAPPED  Y  */ 


/*  FLOATING  POINT  MULT  WITH  ABSOLUTE  VALUE  CAPABILITY  *i 


MULT 

DMULT 

MULTAY 

DMULTAY 

MULTAX 

DMULTAX 

MULTA 

DMULTA 


OBOOOOIOOO 

OBOOOOlOOl 

OBOOOOlOlO 

OBOOOOIOII 

OBOOOOllOO 

OBOOOOllOl 

OBOOOOlllO 

OBOOOOllll 


/•  X*Y 

/*  DP:  X*Y 
/*  X* 

/-  OP:  X* 


/• 

DP: 

/* 

/*  DP: 


5| 


/•  nOATIMO  POINT  SUPPORT 


|X«Y 
INSTRUCTIONS 


•  / 
•/ 
•/ 
•/ 
*/ 

*/ 

•/ 


/•  X  INPUT  SXTURHBD  UHMOOIFIED  «/ 


PASSXM 

OPASaXM 


-  OBOOOIOOOO  /•  X  */ 
•  OBOOOlOOal  /*  DP:  X  •/ 


/•  BBGISTER  ACCESS  INSTRUCTIONS  *! 


FRBGMR 

FRBOHW 

IRSONR 

IRSQMW 

KREatR 

MREGMM 


OBOlOllOlO  /*  FMPy  FLAG  REGISTER  READ  •/ 
OBOlOIlOll  /<  FMPY  FIAG  REGISTER  WRITE  •/ 
OBOIOlllOO  /•  FKPY  IHT  REGISTER  READ  •/ 
OBOIOIXIOI  /•  FMPY  IHT  REGISTER  WRITE  */ 
OBOlOIlllO  /*  FMPY  MODS  REGISTER  READ  */ 
OBOlOlllIl  /*  FMPY  MODE  REGISTER  WRITE  */ 


/•  INTEGER  ARITBMETIC  INSTRUCTICNIS  */ 

/*  INTEGER  MUITIPLICATIOH  INSTRUCTIONS  •/ 

IMULT  •  OBlllllOOO 

IMUI.T3X  ■  OBlIIllOOI 

IMULTSY  -  OBlllIlOlO 

INULTS  -  OBlllllOll 

IMUI.TB  ■  OBllllllOO 

IMUITHSX  •  OBlllllXOl 

IMULTH3Y  •  OBXXXXXXIO 

IMULTBS  •  OBXXXXXXXX 


OBX 

OBX 


> 

/•  T  OUTPUT  PORT  CONTROL  */ 
ZEM[X] 

DEFAULT 
HOLD 
EN 

> 


TSEL[2] 

DEFAULT 

LS 

MS 

LSMS 

MSLS 


/•  UNSIGNED 
/•  SIGNED  X 
/•  UNSIGNED 
/•  SIGNED  X 
/•  UNSIGNED 
/•  SIGNED  X 
/•  UNSIGNED 
/*  SIGNED  X 


X  • 


*  UNSIGNED  Y  */ 
UNSIGNED  Y  •/ 

»  SIGHED  Y  */ 
SIGNED  Y  */ 

•  UNSIGNED  Y  •/ 
UNSIGNED  Y  */ 

SIGNED  Y  •/ 
SIGNED  Y  •/ 


/•  HOLD  Z  REGISTER  */ 
/•  HOLD  Z  REGISTER  •/ 


OBO  /*  ENABLE  Z  REGISTER  •/ 


-  OBOO  /*  OUTPUT  LS  •/ 

-  OBOO  /*  OUTPUT  LS  */ 

-  OBIX  /•  OUTPUT  MS  */ 

-  OBOX  /•  OUTPUT  LS  TO  CROSSBAR  NS  •/ 
•  OBXO  /*  OUTPUT  MS  TO  CROSSBAR  LS  •/ 


ALU  ONE  (FIELD  AX) 


ALU  TWO  (FIELD  A2) 


A2 

DEFAULT 


«  OBOOOOOOOXOOOOOOXOOXOXXOOOXOO 

/•  CROSSBAR  REGISTER  SOURCE  SELECT  */ 

XSBL(4] 


*X 


. 

DEFAULT 

0X0 

/• 

- 

BOUX 

0X0 

7* 

- 

MX 

oxx 

/• 

- 

M2 

0X9 

7* 

• 

AX 

0X1 

/• 

- 

A2 

0X2 

7* 

• 

AR 

0X3 

7* 

- 

AI 

0X4 

7* 

• 

BR 

0X5 

7* 

- 

81 

0X6 

7* 

AUXR 

0X7 

7* 

• 

AUU 

0X8 

/• 

— 

icm 

OXB 

7* 

- 

lOI 

OXC 

/• 

- 

REOA 

OXD 

7* 

•7 

SELECT  ALU  t2  *7 
SELECT  CACHE  PORI  A  REAL  *7 
SELECT  CACHE  FORT  A  INAQINARY 
SELECT  CACHE  PORT  B  REAL  *7 
SELECT  CACHE  PORT  B  IMAGINARY 
SELECT  AUXILIARY  PORT  REAL  •/ 


»/ 


»/ 


A  •/ 


CFB.FIX 
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655  - 

656  - 

657  - 
656  - 

659  - 

660  - 
661  - 
662  - 

663  * 

664  - 

665  - 

666  •> 
667  - 
666  > 

669  * 

670  • 

671  • 

672  - 

673  - 

674  - 

675  - 

676  • 

677  - 
676  - 

679  - 

680  - 
681  - 
662  - 
663  • 
684  - 
665  - 
686  - 

687  - 

688  * 
669  - 

690  • 

691  - 

692  - 

693  - 

694  - 

695  - 

696  - 

697  - 

698  - 

699  - 

700  - 

701  - 

702  - 

703  - 

704  - 

705  - 

706  • 

707  - 

708  - 

709  - 

710  - 

711  - 

712  - 

713  - 

714  - 

715  - 

716  - 

717  - 

718  - 

719  - 

720  - 

721  - 

722  - 

723  - 

724  - 

725  - 

726  - 

727  - 

728  - 

729  - 

730  - 

731  - 

732  - 

733  - 

734  - 

735  - 

736  - 

737  - 

738  - 

739  - 

740  - 

741  - 

742  - 

743  - 

744  - 

745  - 

746  ' 

747  - 

748  - 

749  - 

750  - 

751  - 

752  - 

753  - 

754  - 

755  - 

756  - 

757  - 

758  - 

759  - 

760  - 

761  - 

762  - 

763  - 


REGB 

DIS 


> 

/*  PORT  X  COHTROL  */ 


OXE  /*  SELECT  REGISTER  PILE  PORT  B  •/ 
OXF  /*  DISABLE  PORT  (REGISTER  BOLDS)  */ 


XCTRL[4] 

^  DEFAULT 
BOLD 
POMS 
PUIS 
POLS 
PILS 
TPOHT 

} 

/•  CBOSSBAR 

ySEL[41 

^  DEFAULT 
BOLD 
Ml 
M2 
AX 
A2 
AR 
AI 
BR 
BI 

AUXR 
AUXI 
lOR 
lOI 
BE6A 
REGB 
DIS 


OBOOOl 

OBOOOl 

OBOOll 

OBlOll 

OBOOOO 

OBIOOO 


/*  bold  */ 

/•  HOLD  »/ 
/•  PHASE  0 
/•  PHASE  1 
/•  PHASE  0 
/*  PHASE  1 


MOST  SIGHIFICAMT  */ 
MOST  SIGHIFICAHT  */ 
LEAST  SiaHIFICABT  •/ 
LEAST  SIGNIFICABT  •/ 


) 

/*  PORT  Y  CONTROL  */ 


OBOlOO  /*  T  PORT  */ 


REGISTER  SOURCE  SELECT  •/ 


0X0  h  BOLD  (REGISTER  SELECTS  IT'S  SELF)  ' 
0X0  /•  HOLD  (REGISTER  SELECTS  IT'S  SELF)  ' 
OXA  /*  SELECT  MULTIPLIER  #1  */ 

0X9  /*  SELECT  MULTIPLira  #2  */ 

0X1  /*  SELECT  ALU  #1  */ 

0X2  /*  SELECT  ALU  #2  */ 

0X3  /•  SELECT  CACHE  PORT  A  BEAL  */ 

0X4  /*  SELECT  CACHE  PORT  A  IMAGINARY  */ 
0X5  /•  SELECT  CACHE  PORT  B  BEAL  */ 

0X6  /•  SELECT  CACHE  PORT  B  IMAGINARY 
0X7  /•  SELECT  AUXILIARY  PORT  REAL  */ 

0X8  /*  SELECT  AUXILIARY  PORT  IMAGINARY  •/ 
OXB  /*  SELECT  I/O  PORT  REAL  */ 

OXC  /•  SELECT  I/O  PORT  IMAGINARY  »/ 

OXD  /•  SELECT  REGISTER  PILE  PORT  A  •/ 

OXE  /»  SELECT  REGISTER  FILE  PORT  B  •/ 

OXF  /*  DISABLE  PORT  (REGISTER  BOLDS)  •/ 


YCTRL(3] 

^  DEFAULT 
BOLD 
POMS 
PIMS 
POLS 
PILS 

} 

/•  Y  SELECT  IHTERKAL  TO  THE  ALU  •/ 
IYSEL[1] 


•  OBOOl  /*  HOLD  »/ 

-  OBOOl  /•  BOLD  »/ 
«  OBOll  /*  PHASE  0 

-  OBlll  /•  PHASE  1 

•  OBOOO  /*  PHASE  0 
>  OBIOO  /•  PHASE  1 


MOST  SIGNIFICANT  >/ 
MOST  SIGNIFICANT  •/ 
LEAST  SIGNIFICANT  •/ 
LEAST  SIGNIFICANT  */ 


DEFAULT 

ZREG 


-  OBO  /*  Y  REGISTER  •/ 

-  OBI  /*  Z  REGISTER  «/ 


> 

/»  ALU  INSTRUCTIONS  */ 

IN3(8] 

DEFAULT  -  OBOlOllOOO  /•  NO  OPERATKXI  */ 

NOP  >  OBOlOllOOO  /•  HO  OPERATION  */ 

/•  floating  POINT  ARITHMETIC  INSTRUCTIONS  */ 

/*  MAXIMUM/MINIMUN  •/ 

MIN  -  OBOOlOOlOO  /*  FLOATING  POINT  MIN  •/ 

DMIN  -  OBOOIOOIOI  /•  DP;  FLOATING  POINT  MIN  •/ 

MAX  •  OBOOlOOllO  /•  FLOATING  POINT  MAX  •/ 

DMAX  >  OBOOlOOlll  /•  DP:  FLOATING  POINT  MAX  */ 

/•  ABSOLUTE,  NEGATE  OR  PASS  X  OPERAND  */ 


ABSX  -  OBOOIOIOOO  /*  |X|  •/ 
DABSX  -  OBOOlOlOOl  /•  DP:  X  «/ 
NEGX  -  OBOOlOlOlO  /*  -X  •/ 
DHEOX  -  OBOOlOlOll  /•  DP:  -X  */ 
PASSX  -  OBOOlOllOO  /•  X  */ 
DFASSX  -  OBOOlOllOl  /•  DP:  X  */ 

/*  addition  and  SUBTRACTION  */ 


ADD 

DADD 

SUBTR 

DSUBTR 

8UBX 

Dsuax 

ADDA 

OADOA 

8UBA 

OSUBA 

8UBXA 

D8UBXA 


OBOOllOOOO  /*  X«Y  •/ 
OBOOllOOOl  /*  DP;  X*Y  •/ 
OBOOllOOlO  /•  X-Y  •/ 
/•  DP:  X-Y  •/ 


OBOOllOOll 
OBOOllOlOO  . 
OBOOllOlOl  /•  DP: 
0800111000  /• 
OBOOlllOOl  /•  DP: 
OBOOlllOlO  /• 
OBOOl 11011  /•  DP: 
OBOOllllOO  /* 
OBOOllllOl  /•  DP: 


Y-X  */ 

-?  r/. 


•/ 

*/ 

•/ 

•/ 

*/ 

»/ 


/•  FLOATING  POINT  SUPPORT  IHSTRUCTIORS  •/ 
/•  SCALE  */ 


SCALE 

DSCALE 


•  OBOOIOOOOO  /*  EXPONENT  X  Y  */ 
-  OBOOlOOOOl  /•  DP:  IXPOWEHT  X  «  Y  •/ 


/•  MERGE  (CONCATENATE)  */ 


MERGE 

DMERia 


-  OBOOlOOOlO  t*  SIGN  X.EXPORENT  T.MANTISSA  X  */ 

-  OBOOlOOOll  /•  DP:  SIGN  X,EXPOKENT  Y.MAETISSA  X  •/ 


/*  HORHALIZE  X  •/ 


Paga:  7 


i 

} 


L 


Data:  6/30/92 
Size:  53964 


File:  CPB.FIX 
Last  Modified;  Frl  Feb  21  16:26:02  1992 


IMMEDIATE  DATA  (FIELD  IMM)  » 


IMM[1] 

DEFAI^T  -  OBI 

/•  ENABLE  FOB  IMMEDIATE  DATA  REGISTEK  «/ 

CTBL[1] 

I 

DEFAULT  -  OBO 

EN  -  OBO  /*  ENABLE  DATA  REGISTER  •/ 

DIS  •  OBI 

)  > 


•  DATA  REGISTER  FILE  (FIELD  REG) 


RE6[361 
DEFALT  -  0 


-  0X000000000 
/*  PORT  A  INPUT 
AS£L[5] 

^  DEFAULT 
HOP 
CLEAR 
Ml 
M2 
A1 
A2 
AR 
AI 
BR 
BI 

AUXR 

AUXI 

lOR 

lOI 

REGA 

REGB 

SET 


SOURCE  */ 


OBOOOOO  /*  NO  OPERATION  >/ 

OBOOOOO  /*  NO  OPERATION  */ 

OBOOOOl  /•  CLEAR  REGISTER  (ALL  ZEROS)  •/ 
OBIOIOI  /•  SELECT  MULTIPLim  «1  */ 

OBlOOll  /*  SELECT  MULTIPLIER  f2  •/ 

OBOOOll  /•  SELECT  ALU  tl  */ 

OBOOlOl  /•  SELECT  ALU  #2  */ 

OBOOlll  /•  SELECT  CACHE  PORT  A  REAL  */ 

OBOlOOl  /•  SELECT  CACHE  PORT  A  IMAGINARY  •/ 
OBOlOll  /•  SELECT  CACHE  PORT  B  REAL  •/ 

OBOllOl  /»  SELECT  CACHE  PORT  B  IMAGINARY  */ 
OBOllll  /*  SELECT  AUXILIARY  PORT  REAL  •/ 
OBlOOOl  /•  SELECT  AUXILIARY  PORT  IMAGINARY  */ 
OBlOlll  /•  SELECT  I/O  PORT  REAL  •/ 

OBllOOl  /•  SELECT  I/O  PORT  IMAGINARY  •/ 
OBllOll  /•  SELECT  REGISTER  FILE  PORT  A  */ 
OBlllOl  /•  SELECT  REGISTER  FILE  PORT  B  •/ 
OBlllll  /•  SET  REGISTER  (ALL  ONES)  */ 


/*  PORT  A  WRITE  ADDRESS  */ 
WRA[61 


OBOOOOOO  /•  REGISTER  0  «/ 


/•  PORT  B  INPUT  SOURCE  »/ 

BSEL[5) 

^  DEFAULT  >  OBOOOOO  / 

NOP  ■  OBOOOOO  / 

CLEAR  «  OBOOOOl  / 

Ml  •  OBIOIOI  / 

M2  •  OBlOOll  / 

Al  -  OBOOOll  / 

A2  •  OBOOlOl  / 

AR  -  OBOOlll  / 

AI  «  OBOlOOl  /’ 

BR  -  OBOlOll  / 

BI  -  OBOllOl  / 

AUXR  -  OBOllll  / 

AUXI  -  OBlOOOl  / 

lOR  •  OBlOlll  /’ 

lOI  -  OBllOOl  /' 

REGA  -  OBllOll  /' 

REGB  -  OBlllOl  / 

SET  -  OBlllll  / 

) 

/*  PORT  B  WRITE  ADDRESS  •/ 

WBB(6] 


HO  OPERATION  */ 

NO  OPERATION  */ 

CLEAR  REGISTER  (^  ZEROS)  •/ 
SELECT  MULTIPLira  II  •/ 

SELECT  MULTIPLIER  12  */ 

SELECT  ALU  #1  *T 
SELECT  ALU  12  •/ 

SELECT  CACHE  PORT  A  REAL  */ 

SELECT  CACHE  PORT  A  IMAGINARY  •/ 
SELECT  CACHE  PORT  B  REAL  */ 

SELECT  CACHE  PORT  B  IMAGINARY  */ 
SELECT  AUXILIARY  PORT  REAL  */ 
SELECT  AUXILIARY  PORT  U4AOINARY  */ 
SELECT  I/O  PORT  REAL  */ 

SELECT  I/O  PORT  IMAGINARY  •/ 

SELECT  REGISTER  FILE  PORT  A  •/ 
SELECT  REGISTER  FILE  FORT  B  */ 

SET  REGISTER  (ALL  ONES)  •/ 


DEFAULT  •  OBOOOOOO  /•  REGISTER  0  •/ 

) 

/•  SHIFT  MODE  CONTROL  */ 

SH00E(2] 

DEFAULT  -  OBOO  /•  NORMAL  (REGISTER  FILE  MODE)  */ 

REG  >  OBOO  /•  NORMAL  (REOISTER  FILE  NODE]  */ 

R8X8  •  OBOl  /•  8  BY  8  SHIFT  REGISTER  NODE  */ 

R4X16  -  OBIO  /•  4  BY  16  SHIFT  REGISTER  NODE  •/ 

R2X32  -  OBll  /*  2  BY  32  SHIFT  REGISTER  NODE  •/ 

) 

/*  PORT  A  READ  ADDRESS  */ 

^[6] 

^  DEFAULT  •  OBOOOOOO  /*  REOISTER  0  */ 

/*  PORT  B  READ  ADDRESS  */ 
mB[6] 

DEFAULT  •  OBOOOOOO  /*  REGISTER  0  */ 


Paga: 
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APPENDIX  F 


lOP  DEFINITION  FILE 


Date:  6/30/92 
Size:  10968 


File:  lOFMQA.DEF 
Last  Modified:  Tue  Jun  02  16:27:38  1992 


THIS  FILE  INCLUDES  HAEDWARE  CHANCES  AS  OF  MAI  30«  1992 


/*IOP  Microprogram  sequencer  lOPMGA.DEF  2  June  1992  */ 
/•BIT  ASSIGNMENTS  FOR  MICROSEQUENCER*/ 

WIDTH  -  48 
PHASES  -  1 
DEFBIT  =  0 

^  >  DATA  BUS  SOURCE  AND  DESTINATION  SELECT  (FIELD  CR) 


FAULT  «  OBIOOOOOO 
DESTr4I 

{  DEFAULT  -  OB 


BUSr3] 

{  DEFAULT 


/•DESTINATION  FOR  DATA^/ 

Q 

0  /•NO  WRITER/ 

0  /•MACRO  RAM  ADDRESS  INPUT  REGISTER  WRITE^/ 

1  /•MACRO  RAM  COUNTER  ADDRESS  WRITER  / 

0  /•MACRO  RAM  COUNTER  ADDRESS  INPUT  REGISTER  WRITER  / 
0  /•INCREMENT  MACRO  RAM  ADDRESS^/ 

1  /•MACRO  RAM  WRITE^/ 

1  /•CONTROL  REGISTER  WRITER/ 

/•DATA  BUS  SOURCE  SELECT^/ 

/•NO  BUS  SELECTED 
/•MICROPROGRAM  RAM^/ 

/•IBM  PC  INTERFACE^/ _ 

/•SERIAL  10  (VME)  INTERFACE*/ 

/•HIGH  SPEED  10  INTERFACE*/ 

/•MACRO  RAM*/ 

/•BOOT  INSTRUCTION*/ 


REGISTER  ADDRESS  (FIELD  CRA) 


CRAf4] 

(  default  -  OBOOOO 
R£GAD(4] 
{DEFAin.T  - 


(DEFAULT 

SOURCE 

PCSTAT 

PCTRAN 

PCIMK 

SIO 

EIOTR 

SIOMK 

HSIO 

CNTA 

CNTB 

HRAMAR 

MRAMACT 

MRAMCT 

BSIOAC 

HSIOSAR 

lOPCR 

) 


OBOOOO 

0X0  /*RESOURCES  IN  USE*/ 

0X1  /•PC  INTERFACE  CONTROL  AND  STATUS*/ 
0X2  /*PC  TRANSMIT  CONTROL*/ 

0X3  /*PC  INTERRUPT  MASK*/ 

0X4  /*  SERIAL  INTERFACE  CONTROL*/ 

0X5  /•SERIAL  INTERFACE  TRANSMIT  CONTROL*/ 
0X6  /* SERIAL  TRANSMIT  INTERRUPT  MASK*/ 

or*  /•HIGH  SPEED  10  INTERFACE  CONTROL*/ 
0X8  /*A  COUNTER  DATA  TRANSFER  COUNT*/ 

0X9  /*B  COUNTER  DATA  TRANSFER  COUNT*/ 

OXA  /•MACRO  RAM  ADDRESS  REGISTER*/ 

OXB  /*  ■  •  *  COUNTER*/ 

OXC  /*  *  "  COUNTER  REGISTER*/ 

OXD  /*CPH  HSIO  ADDRESS  COUNTER*/ 

OXE  /•CPB  HSIO  SYSTEM  ADDRESS  REGISTER*/ 

OXF  /*I0P  CONTROL  REGISTER*/ 


CONDITION  SELECT  (FIELD  CCS) 


{  DEFAULT  «  OBOOOOOO 
CCODE[6] 


60  - 

PCTFF 

-  OBOOOOOO 

81  - 

PCTAPF 

-  OBOOOOOl 

62  - 

PCTAEF 

-  OBOOOOll 

83  - 

84  - 

85  - 

PCRFF 

-  OBOOOlOl 

86  - 

87  - 

68  - 

69  • 

PCREF 

-  OBOlOOll 

90  - 

SIOTF 

«  OBOOlOlO 

91  - 

92  - 

93  - 

94  - 

SIOTB 

-  OBOOlllO 

95  - 

SIORP 

-  OBOOllll 

96  > 

97  - 

96  - 

99  > 

SIORE 

»  OBOlOlll 

100  - 

ZCNTA 

•  OBOlOlOO 

101  - 

ZCNTB 

-  OBOIOIOI 

102  - 

HSIOR 

-  OBOllOOO 

103  - 

HSIOT 

-  OBOllOOl 

104  - 

SIOR 

-  OBOllOlO 

105  - 

8IOT 

«  OBOllOll 

106  - 

PCR 

«  OBOlllOO 

107  - 

PCT 

-  OBOlllOl 

108 

H5IOB 

-  OBOllllO 

109  - 

8YSI0 

•  OBOlllll 

/•IBM-PC  TRANSMIT  FULL  FLAG*/ 

/•IBM-PC  TRANSMIT  ALMOST  FULL  FLAG*/ 
/•IBM-PC  TRANSMIT  HALF  FULL  FLAG*/ 
/•IBM-PC  TRANSMIT  ALMOST  EMPTY  FLAG*/ 

/•IBM-PC  RECEIVE  FULL  FLAG*/ 

/•IBM-PC  RECEIVE  ALMOST  FULL  FLAG*/ 
/•IBM-PC  RECEIVE  HALF  FULL  FLAG*/ 
/•IBM-PC  RECEIVE  ALMOST  EMPTY  FLAG*/ 
/•IBM-PC  RECEIVE  EMPTY  FLAG*/ 

/•SIO  TRANSMIT  FULL  FLAG*/ 

/•SIO  TRANSMIT  ALMOST  FULL  FLAG*/ 
/•SIO  TRANSMIT  HALF  FULL  FLAG*/ 

/•SIO  TRANSMIT  ALMOST  EMPTY  FLAG*/ 
/•SIO  TRANSMIT  EMPTY  FLAG*/ 

/•SIO  RECEIVE  PULL  FLAG*/ 

/•SIO  RECEIVE  ALMOST  FULL  FLAG*/ 
/•SIO  RECEIVE  HALF  FULL  FLAG*/ 

/•SIO  RECEIVE  ALMOST  EMPTY  FLAG*/ 
/•SIO  RECEIVE  EMPTY  FLAG*/ 

/•COUNTER  A  ZERO*/ 

/•COUNTER  B  ZERO*/ 

/•HSIO  RECEIVE  INTERFACE  AVAILABLE*/ 
/•HSIO  TRANSMIT  INTERFACE  AVAILABLE*/ 
/•SIO  RECEIVE  INTERFACE  AVAILABLE*/ 
/•SIO  TRANSMIT  INTERFACE  AVAILABLE*/ 
/•PC  RECEIVE  INTERFACE  AVAILABLE*/ 

/•PC  TRANSMIT  INTERFACE  AVAILABLE*/ 
/•HSIO  BUS  BUSY*/ 

/•SYSTEM  INTERRUPT  0  •/ 
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-  */ 


SEQ[31]/*BE  CAREFUL  WITH  LOWER  24-BlTS*/ 
{  DEFAULT  -  0X0000000 


-  /* 


110 
111 
112 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127  -  INSTRf?) 

128  - 

129  - 

130  - 

131  - 

132  - 

133  - 

134  - 

135  - 

136  - 

137  - 

138  - 

139  - 

140  - 

141  - 

142  - 

143  - 

144  o 

145  - 

146  > 

147  - 

148  - 

149  - 

150  - 

151  - 

152  -  > 

153 

154 

155 

156 

157 

158 

159 

160 
161 
162  - 

163  - 

164  - 

165  - 

166  - 
167  - 
166  - 

169  - 

170  - 

171  - 

172  - 

173  - 

174  - 

175  - 

176  - 

177  - 

178  - 

179  - 

180  - 
181  - 
182  - 

183  - 

184  > 

185  - 

186  > 

187  - 

188  • 

189  - 

190  > 

:91  - 

192  - 

193  - 

194  - 

195  - 

196  - 

197  - 

198  - 

199  - 

200  ' 

201  - 
202  - 

203  - 

204  • 

205  - 

206 

207  - 

208  - 

209  • 

210  » 

211  - 
212  - 

213  - 

214  - 

215  - 

216  - 

217  - 

218  > 


SYS  II 
SYSI2 
SYSI3 
SYSI4 
SYSI5 
SYSI6 
sysi7 


OBIOOOOO 

OBlOOOOl 

OBlOOOlO 

OBlOOOll 

OBlOOlOO 

OBIOOIOI 

OBlOOllO 


/•SYSTEM 

/•SYSTEM 

/•SYSTEM 

/•SYSTEM 

/•SYSTEM 

/•SYSTEM 

/•SYSTEM 


INTERRUPT  1*/ 
INTERRUPT  2*/ 
INTERRUPT  3*/ 
INTERRUPT  4»/ 
INTERRUPT  5^/ 
INTERRUPT  6*/ 
INTERRUPT  7*/ 


Microsequencer  Instructions  &  Branch  Address/Data 


{  DEFAULT 
CONT 
IDLE 
IBC 
WCS 
JPCOP 
JPCNP 
REL8 


/•MICROSEOUENCER  INSTRUCTION*/ 


OBOOOOOOO 

OBOOOOOOO 

OBOOIOOOO 

OBOIOOIOI 

OBOIOOOOO 

OBOOIOIOI 

OBOllOlOl 

OBOlOOllO 


/•INTERRUPT  CONTROL*/ 


CCIR 

CAIR 

IRMBC 

IRMBS 

DISIR 

ENAIR 

RDIV 

WRIV 

RTNIR 

SLRIVP 

SLIR 

STIR 


OBOOlOOOl 

OBOOOOOOl 

OBOOlOOll 

OBOOlOOlO 

OBOOlOllO 

OBOllOllO 

OBOlOllOl 

OBOOOllOl 

OBOOOOOll 

OBOOlllOl 

OBOOlOlll 

OBOllOlll 


/•CONTINUE*/ 

/•ENABLE  INSTRUCTION  BOLD  CONTROL*/ 

/•WRITE  CONTROL  STORE*/ 

/•IP  FLAG,  JUMP  PC  (SELF)*/ 

/•IF  MOT  FLAG,  JUMP  PC(SELP)*/ 

/•RELATIVE  ADDRESS  WIDCT  8,  TERMINATES  IBC*/ 


/•CLEAR  CURRENT  INTERRUPT*/ 

/•CLEAR  ALL  INTERRUPTS*/ 

/*IR  MASK  BITWISE  CLEAR*/ 

/*IR  MASK  BITWISE  SET*/ 

/•DISABLES  INTERRUPTS*/ 

/•ENABLES  INTERRUPTS*/ 

/•READ  IV  AND  INCREMENT  IV  POINTER*/ 

/•WRITE  IV  AND  INCREMENT  IV  POINTER*/ 

/•RETURN  PROM  INTERRUPT*/ 

/•WRITES  STACK  LIMIT  REGISTER  AND  IV  POINTER*/ 
/•SELECTS  LATCHED  INTERRUPTS*/ 

/•SELECTS  TRANSPARENT  INTERRUPTS*/ 


«  BRANCH  ADDRESS  OR  WRITE  ENABLES  4  DATA  WRITES  (FIELD  REG) 


A*********************************************************** 

NOTE:  REG  bits  function  as  follows: 

Bits  23.. 12  -  Data,  useq  branch  address,  or 
write  enables  for  CRA  Field 
Bits  11.. 0  Data,  useq  branch  address,  or 
data  to  write  for  CRA  Field 


EXCEPTIONS 


♦♦♦♦♦♦♦ 


CRA  Fields  649  (CNTA  4  CNTB)  use  bits  19.. 0 
as  20  bit  counters 

CRA  Field  10  (MRAMAR)  uses  bits  12.. 0  as  a  13  bit 
Macro  RAM  Address 

CRA  Field  13  (BSIOAC)  uses  bits  23.. 0  as  a  24  bit 
CPH  I/O  and  nerory  space  address  counter 


REG[241 


ENABLES [12] 


{  DEFAULT  «  OXFFFFFF 
LABEL 


OBllllllllllll 

OXFOO  /•Enables  registers  12-19*/ 
OXFFF  /•Disables  registers  12-23*/ 
OXEFF  /•Enables  register  bit  20*/ 
0XF7F  /*Enable8  register  bit  19*/ 
OXFBF  /•Enables  register  bit  18*/ 
OXFDF  /•Enables  register  bit  17*/ 
OXFEF  /•Enables  register  bit  16*/ 
0XFF7  /*Enable8  register  bit  15*/ 
OXFFB  /•Enables  register  bit  14*/ 
OXFFD  /•Enables  register  bit  13*/ 
OXFFE  /•Enables  register  bit  12*/ 
0XFF4  /•BSIO  source/data  enable*/ 
0XFB7  /•PCXKIT  clock/data  enable*/ 


<  DEFAULT 
ALL 
NONE 
ENK20 
ENR19 
ENR18 
ENR17 
ENR16 
ENR15 
ENR14 
ENR13 
ENR12 
HSIOINT 
PCXSBT 

> 

DATAri21 

<  Di^AULT  •  OBllllllllllll 

CLR  -  OXFFF 

/•Clears  enabled  data  bits  O-ll  */ 

SET  -  0X000 

/•Sets  enabled  data  bits  O-Il  •/ 

PSDATA  -  0XFD7 

/•selects  real  32  bit  dataXclk  a  (PCXSET  CRA  2  -PCTRAK)*/ 

/•selects  real  32  bit  dataXclk  a  (PCXSET  CRA  5  -SIOTR)^/ 
SINT  •  0XFF3 

/•Enables  receive  FF  (ENRI3  CRA  4  -SIO)*/ 

8IORFF  •  OXFDF 

/•Enables  SIO  receive  FF  (ENR13  CRA  6  -810HK)*/ 

8IOTFF  *  OXFFE 

/•Enables  SIO  trAnemit  FF  (BNR13  CRA  6  -SIONK)*/ 

HIOPC  •  OXFDA 

/•Selects  PC  as  BSIO  source  32  bit  real, 

CLK  A,  data  (HSIOINT  CRA  7  -Bsio)*/ 

HIOCPH  -  OXFCA 

/•Selects  CPH  as  BSIO  source  32  bit  real, 

CLK  A  data  (BSIOIKT  CRA  7  -BSIO}*/ 

HIOSIO  -  OXFEA 
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APPENDIX  6 

VME  ADDRESS  CONTROL 


6-1 


Date;  2/  9/92 
Size:  4916 


File:  A:l£XADDR.nAf‘ 
Last  Hodified:  Thu  Feo  20  11:50:50  1992 


0  -  7FFF  or  FFFF 


9  -  4  0000  -  5  FFFF 


11  -  8  0000  -  8  3FFF 


13  -  C  0000  -  C  OFFF 


15  -  C  1000  -  C  IFFF 


17  -  10  0000  -  11  FFFF 


19  -  14  0000  -  14  OFFF 


14  1000  -  14  IFFF 


23  -  18  0000  -  19  FFFF 


25  -  1C  0000 


27  -  1C  0004 


29  -  20  OOOO 


2o  u0o2 


35  -  20  0006 


7  -  24  0000 


39  -  24  0002 


41  -  24  0004 


43  -  28  0001  -  28  0018 


45  -  28  0021  -  28  002F 


2000  0000 


49  -  6000  0000 


02(.)  4-PORT  SRAM 


lll,uil,OXO,Oi/j 


ZORAN  BUS  il  PR^^  |  BS4  I  lll,OXO.OCi* 


ZORAN  43  i  BS5  1  111,OXO,OOX 

i  i 


ZORAN  #4  j  BS5.  ']  lll,OxO.<XiX 


ZORAN  BUS  »2  PRAM  j  BS6  i  lll.OXO.OOX 


VPH  STATUS  LATCH  I  B57  (  lll,OXO,’XiX 


ZORAN  RESET  LATCH  1  B37  i  lll.uXO,OOX 


CPH  ADDRESS  (FC=011  FOR  AUGMENTED  XFERS)  (SELECTABLE  AT  CPH) I  BS8  i  lll,0X0,00x 


REQUEST  (WRITE)  OR  RELINQUISH  (READ)  0M£  BUS  (BYTE  OR  WORD)  1  B58  i  lll.OXO.OOX 


DHB  FLAG  (BYTE  OR  WORD  READ  BIT  DO)  j  BSE  j  lll.OXO.OOx 


AUGMENTED  XFER  ADDRESS  COUNTER  LOAD  ADDRESS  (WORD)  j  BSG  j  lll.OXO.OOX 


PC  INTERFACE  FIFO  (WORD)  i  BS9  i  lll.OXO.wX 


PC  INTERFACE  STATUS/CONTROL  REGISTER  (WORD)  j  BS9  i  lll.OXO.OOX 

I  i 

i  I 


PC  INTERFACE  INTERRUPT  REGISTER  (WORD)  i  BS9  1  lll.OXO.OOX 

I 


MVME6000  LCSR  (ODD  BYTES)  1  BSloi  111,OXO.(X)X 


MVME6000  6CSR  (ODD  BYTES)  1  BS10|  lll.OXO.OOX 


DSACK  SRAM  EWBLE 


OSACK  SRAM  DISABLE 


1 


Date;  2/  9/92 
Size:  3105 


File:  A:HEXADL2.HAP 
Last  (tadified:  Thu  Jan  23  13:45:52  1992 


1 

o 

3 

4 

c 

o 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 

22 

*^7 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 
49 

49 

50 


VPH  HEX  ADDRESS  MAP 


HEX  ADDRESS  1 

i 

f^SOURCE 

1 

0  -  7FFF  or  FFFF  ! 

1 

j  EPROM 

4  0(X)0  -  5  FFFF 

020  SRM1 

8  0000  -  8  3FFF 

020  4-PORT  SRAM 

C  0000  “  C  OFFF  1 

ZORAN  11 

C  1000  -  C  IFFF 

ZORAN  12 

10  0000  -  11  FFFF 

ZORAN  BUS  #1  PR/»1 

14  CK)00  -  14  OFFF 

ZORAN  t3 

14  1000  -  14  IFFF 

ZORAN  14 

18  0000  -  19  FFF 

'  ZORAN  BUS  #2  PRAM 

IC  0000 

1 

VPH  STATUS  LATCH 

1C  0004  j 

ZORAN  RESET  LATCH 

20  0000 

CPH  ADDRESS  (FC=0ll  FOR  AUSfOTED  XFERS)  (SELECTABLE  AT  CPHj 

20  m2  \ 

1 

REQUEST  (WRITE)  OR  RELINQUISH  (READ)  VME  BUS  (BYTE  OR  WORD) 

20  IXI04 

1 

DHB  Flag  (BYTE  OR  WORD  READ  BIT  DO) 

20  0006 

AUGMENTED  XFER  ADDRESS  COUNTER  LOAD  ADDRESS  (WORD) 

24  0000 

PC  INTERFACE  FIFO  (WORD) 

1 

24  0002 

PC  INTERFACE  STATUS/CONTRQL  F£BISTER  (WORD) 

24  0004 

PC  INTERFACE  INTERRUPT  REGISTER  (WORD) 

28  0001  -  28  OOIB 

HVHE6000  LCSR  (ODD  BYTES) 

28  0021  -  28  002F 

MVfCSOOO  6CSR  (ODD  BYTES) 

2000  0000 

1 

DSACK  SRAM  ENABLE 

6000  0000 

DSACK  SRAM  DISABLE 

Page:  1 


Date:  2/  9/92 
Size:  759 


File:  H:M'/t1E6<»0.D0C 
Last  Modified:  Tue  Sep  17  09:16:26  1991 


1  - 

2  -  MVI1E6<X)0  Configuration  Inforaation 

3  -  - 

4  - 

5  - 

6  -  The  Base  Address  (BA)  of  the  MVME60<i0  in  the  020  address  space  is 

7  -  BA  =  $0028  OOCkj. 

8  - 

9  -  pQNer-up  Sequence: 

10  - 

11  -  Write  $0  to  BA  +  $01  -  this  clears  the  BftDfAIL  bit  in  the 

12  -  LC5R.  Also  selects  priority  arbitration. 

13  - 

14  -  Write  $20  to  BA  +  $05  -  this  tells  the  MVf€6000  that  the 

15  -  local  processor  is  a  68020. 

16  - 

17  -  Write  $6A  to  BA  +  $09  -  this  sets  up  all  bus  tiaers  (see 

18  -  paqe  4-10  in  MVME6000  aanual). 

19  - 

20  -  Write  $8D  to  BA  +  $0D  -  this  configures  the  ff/MEciOOij  to  use 

21  -  AM  code  $0D  for  all  VMEbus  master  transactions. 

22  - 


APPENDIX  H 

lOP  PROGRAMS 


I 


size:  576  Last  Modified:  Wed  Jun  10  08:06:32  1992 

1  -  /*  lOPBOOT.ASM  CREATED:  6/1/92 

2  -  LAST  MODIFIED: 6/2/92 

3  - 

4  -  THIS  SUBROUTINE  IS  INTENDED  TO  WAKE-UP  THE 

5  -  THE  lOP  FROM  THE  IBMPC  INTERFACE.  IT  IS  ASSUMED 

6  -  THAT  THE  BOOT  STATE  MACHINE  PAL  IS  PROGRAMMED  SUCH 

7  -  THAT  THE  IBMPC  INTERFACE  IS  SELECTED  AS  THE  HOST 

8  - 

9  -  »/ 

10  - 
11  - 

12  -  PROGRAM  CODESEG  lORAM 

13  -  0R6  0 

14  -  START:  SSEQ  CONT  ;  /*NECESSARY  COST  INSTRUCTION  FOR  USED*/ 

15  -  $S£Q  CONT 

16  -  SSEQ. REG  ALL,CLR 

17  -  $CRA  SOURCE;  /*CLEAR  ALL  RESOURCE  FLAGS*/ 

18  - 

19  -  /*INITIALIZE  INTERRUPT  VECTOR  POINTERS*/ 

20  - 
21  - 

22  -  PROGRAM  ENDS 


Date:  6/30/92 
Size:  2631 


File:  DKLDIOP.ASM 

Last  Modified:  Wed  Jun  10  12:29:08  1992 


♦  +  +  +  4  +  +  +  >  +  +  +  +  +  +  t  +  +  +  +  +  4  +  +  +  + 

4-  4 

*  THIS  PROGRAM  NEEDS  TO  BE  MODIFIED  TO  4 

4  LOAD  THE  COUNTER  A  ZERO  INTERRUPT  4 

4  VECTOR  TO  POINT  TO  A  SERVICE  ROUTINE  4 
4  THAT  WILL  TERMINATE  THE  TRANSFER  4 

4  4 

444444444444444444444444444444444444444444 


/*  ONLDIOP.ASM 


CREATED:  6/2/92 
LAST  MODIFIED: 


THIS  SUBROUTINE  IS  INTENDED  TO  DOWNLOAD  PC 
MICRO  CODE  TO  THE  lOP  MACRO  RAH.  A  REAL  COUNTER 
VALUE  WILL  NEED  TO  BE  USED  TO  LOAD  COUNTER  A.  THE 
MAXIMUM  COUNT  NUMBER  IS  USED  HERE. 


PROGRAM  CODESEG  lORAM 
ORG  0 

START:  $8EQ  DISIR; 

LOOP:  S8EQ  CONT 

SCRA  SOURCE 


/*DiaABLE  USBQUSNCER  INTERRUPTS*/ 

/*IS  RESOURCE  AVAILABLE??*/ 

$C^  SOURCE 

$CCS  ZCNTA;  */*CHBCK  FOR  COUNTER  A  «  0  */ 

$SEQ  JPCNF,  LOOP;  /*WAIT  FOR  COUNTER  A  TO  COUNT  TO  ZERO*/ 

$8EQ  CONT  /*IS  RESOURCE  AVAILABLE??*/ 

SCRA  SOURCE 

sees  PCT;  /•CHECK  FOR  PC  TRANSMIT  AVAILABILITY*/ 

$SBQ  JPCNF,  LOOPl;  /«WAIT  FOR  PC  TRANSMIT  AVAILABLE*/ 

SSEQ  CONT 

$CR  CRW  /*VRITE  CONTROL  REGISTER  */ 

SCRA  SOURCE 

SSEQ. REG  ENR19,  SET;  /*SET  COUNTER  A  BUSY  FLAG*/ 

SSEQ  CONT 

SCR  CRW  /*WRITE  CONTROL  REGISTER  */ 

SCRA  SOURCE 

SSEQ.RBG  ENR12,  SET;  /•SET  IBM-PC  SEND  BUSY  FLAG*/ 

SSEQ  CONT 
SCRA  CNTA 

SSEQ. REG  OBOOOOOOOOOOOl,  OBllllllllllll;  /"LOAD  COUNTER  A  WITH  MAXIMUM*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 

SSBQ.REG  BNR12,SET;  /*RESET  PC  XMIT  INTERFACE*/ 


/•WRITE  CONTROL  REGISTER  •/ 


/•WRITE  CONTROL  REGISTER  •/ 


/•RESET  PC  XMIT  INTERFACE*/ 


SSEQ  CONT 
SCR  CRW 
ScRA  PCTRAN 

SSEQ.REG  SmXZtCUt;  /*READy  PC  XMIT  INTERFACE*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 

SSEQ.RBG  PCXSET,PSDATA;  /*SST  PC  XMIT  FOR  CLK  A\32  BIT  REAL  DATA*/ 


SSEQ  CONT; 


/•  444  MYSTERY  CODE  TO  LOAD  COUNTER  A*0  INT  VECTOR  444  */ 


SSEQ  CONT 

SCR  MWR^IBKPC;  /•IBM  PC  SELECTED  AS  SOURCE-MACRO  RAM  DESTINATION*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 

SSEQ.REQ  ENR17,  SET;  /*SET  POO  BIT  (LOW)  TO  BEGIN  TRANSFER*/ 


SSEQ  ENAIR; 


/•ENABLE  USEQUENCER  INTERRUPTS*/ 


/•COUNTER  A  ZERO  INTERRUPT  FUNCTION: 
DISABLE  INTERRUPTS 
CLEAR  POO  BIT 
RESET  ALL  BUSY  FLAGS 
RESET  COUNTER  INTERRUPT  VECTOR 
RESET  ALL  USED  PORTS 
ENABLE  INTERRUPTS 


-  PROGRAM  ENDS 


Page: 


1 


Data:  6/30/92 
Size:  3077 


File:  UPLDDAT.ASM 

Last  Modified:  Wed  Jun  10  12:11:12  1992 


♦  ♦ 

4  THIS  PB0GRAM  NEEDS  TO  BE  MODIFIED  TO  * 
4  LOAD  THE  COUNTER  A  ZERO  INTERRUPT  ♦ 

4  VECTOR  TO  POINT  TO  A  SERVICE  ROUTINE  * 
4  THAT  WILL  TERMINATE  THE  TRANSFER  4 

4  4 

444444444444444444444444444444444444444444 


.  /•  UPLDDAT.ASM 


CREATED:  6/2/92 
LAST  MODIFIED: 


THIS  SUBROUTINE  IS  INTENDED  TO  UPLOAD  CACHE 

DATA  TO  AN  IBM  PC.  A  REAL  COUNTER  _ 

VALUE  WILL  NEED  TO  BE  USED  TO  LOAD  COUNTER  A. 
MAXIMUM  COUNT  NUMBER  IS  USED  HERE. 


PROGRAM  CODSSEG  lORAM 
ORG  0 

START:  $SEQ  DISIR; 

LOOP:  SSEQ  CONT 

$CRA  SOURCE 
$CCS  ZCNTA; 

$SEQ  JPCNF,  LOOP; 

LOOPl:  $SEQ  CONT 

SCRA  SOURCE 
sees  PCR; 

SSEQ  JPCNF,  LOOPl; 

LOOP2:  SSEQ  CONT 

$CRA  SOURCE 
sees  HSIOT; 

$SEQ  JPCNF,  LOOP2; 


/«DISABLE  U8BQVBSCSR  INTERRUPTS*/ 

/•IS  RESOURCE  AVAILABLE??*/ 

/•CHECK  FOR  COUNTER  A  -  0  */ 

/•WAIT  FOR  COUNTER  A  TO  COUNT  TO  ZERO*/ 
/•IS  RESOURCE  AVAILABLE??*/ 

/•CHECK  FOR  PC  RECEIVE  AVAILABILITY*/ 
/•WAIT  FOR  PC  RECEIVE  AVAILABLE*/ 


/•CHECK  FOR  BSIO  TRANSMIT  AVAILABILITY*/ 
/•WAIT  FOR  HSIO  TRANSMIT  AVAILABLE*/ 


SSEQ  CONT 

SCR  CRW  /*WR1TE  CONTROL  REGISTER  */ 

SCRA  SOURCE 

SSEQ.REG  ENR19,  SET;  /*SET  COUNTER  A  BUSY  FLAG*/ 


iSEQ  CONT 

CR  CRW  /*WRITE  CONTROL  REGISTER  */ 

CRA  SOURCE 

SEQ.REG  BMR13,  SET;  /*SET  IBM-PC  RECEIVE  BUSY  FLAG*/ 


SSEQ  CONT 

SCR  CRW  /*WRITE  CONTROL  REGISTER  */ 

SCRA  SOURCE 

SSEQ.REG  EKR16,  SET;  /*SET  HSIO  SEND  BUSY  FLAG*/ 

SSEQ  CONT 
SCRA  CNTA 

JSKQ.BKG  0X001,  OXFFF;  /*LOAD  COUNTER  A  WITH  MAXIMUM*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCSTAT 

SSEQ.REG  ENR12,SET;  /*RESET  PC  RECEIVE  INTERFACE*/ 

iSEQ  CONT 
CR  CRW 
CRA  PCSTAT 

SEQ.REG  ENR12,CLR;  /*RBADY  PC  RECEIVE  INTERFACE*/ 


/•RESET  PC  RECEIVE  INTERFACE*/ 


/•READY  PC  RECEIVE  INTERFACE*/ 


SSEQ  CONT 
SCR  CRW 

SCRA  BSIO  /*SET  BSIO  FOR  MEMORY  READ,  CLK  A*/ 

SSEQ. REG  OXFEC,  0XF7A;  /*32  BIT  REAL  DATA*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCSTAT 

SSEQ. REG  ENR19,SET;  /*SET  PC  RECEIVE  TO  ALLOW  INTERFACE  TO  SEND*/ 


SSEQ  CONT; 

SSEQ  CONT 
$CR  ,BSIO; 


/*  444  MYSTERY  CODE  TO  LOAD  COUNTER  A-O  INT  VECTOR  444  */ 
/•BSIO  SELECTED  AS  SOURCE*/ 


SSEQ  CONT 
SCR  CRW 
SCRA  BSIO 

SSSQ.REG  SNR14,  SET;  /«8BT  BOO  BIT  (LOW)  TO  BEGIN  TRANSFER*/ 


SSEQ  BNAIR; 


/•ENABLE  USEQUSNCER  INTERRUPTS*/ 


/•COUNTER  A  ZERO  INTERRUPT  FUNCTION: 
DISABLE  INTERRUPTS 
CLEAR  BGO  BIT 
RESET  ALL  BUSY  FLAGS 
RESET  COUNTER  INTERRUPT  VECTOR 
RESET  ALL  USED  PORTS 
ENABLE  INTERRUPTS 


-  PROGRAM  ENDS 
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Date:  6/30/92 
Size:  3333 


Pile:  DHLDCPB.ASM 

Last  Modified:  Wed  Jun  10  12:09:14  1992 
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-  /* 


•f 

THIS  PROGRAM  NEEDS  TO  BE  MODIFIED  TO  * 
LOAD  THE  COUNTER  A  ZERO  INTERRUPT  * 
VECTOR  TO  POINT  TO  A  SERVICE  ROUTINE  + 
THAT  WILL  TERMINATE  THE  TRANSFER  * 

♦ 


*/ 

/*  DNLDCPB.ASM 


CREATED:  6/2/92 
LAST  MODIFIED: 


THIS  SUBROUTINE  IS  INTENDED  TO  DOWNLOAD  PC 
MICRO  CODE  TO  THE  CHP.  A  REAL  COUNTER 
VALUE  WILL  NEED  TO  BE  USED  TO  LOAD  COUNTER  A. 
MAXIMUM  COUNT  NUMBER  IS  USED  HERE. 


-  */ 


PROGRAM  CODESEG  lORAM 
0R6  0 

$SEQ  DISXR; 


START: 

LOOP: 


-  LOOPl: 


>  LOOP2: 


SSEQ  CONT 
$CRA  SOURCE 
$CCS  ZCNTA; 

SSEQ  JFCHF,  LOOP; 

SSEQ  CONT 
$CRA  SOURCE 
sees  PCT; 

SSEQ  JPCKF,  LOOPl; 

SSEQ  CONT 
$C^  SOURCE 
Sees  HSIOR; 

SSEQ  JPCNF,  LOOP2; 


/*DISABLE  USEQUENCER  INTERRUPTS*/ 

/*IS  RESOURCE  AVAILABLE??*/ 

/“CHECK  FOR  COUNTER  A  «  0  */ 

/•WAIT  FOR  COUNTER  A  TO  COUNT  TO  ZERO*/ 
/*IS  RESOURCE  AVAILABLE??*/ 

/“CHECK  FOR  PC  TRANSMIT  AVAILABILITY*/ 
/•WAIT  FOR  PC  TRANSMIT  AVAILABLE*/ 

/•CHECK  FOR  HSIO  RECEIVE  AVAILABILITY*/ 
/•WAIT  FOR  HSIO  RECEIVE  AVAILABLE*/ 


SSEQ  CONT 
SCR  CRW 
$CRA  SOURCE 

SSEQ. REG  ENR19,  SET;  /•SET  COUNTER  A  BUSY  FLAG*/ 


/“WRITE  CONTROL  REGISTER  */ 


SSEQ  CONT 
SCR  CRW 
SCRA  SOURCE 

SSEQ. REG  ENR12,  SET;  /“SET  IBM-PC  SEND  BUSY  FLAG*/ 


/“WRITE  CONTROL  REGISTER  •/ 


SSEQ  CONT 
$CR  CRW 
SCRA  SOURCE 
SSEQ. REG  ENR17, 

SSEQ  CONT 
SCRA  CNTA 

SSEQ.REG  0X001,  OXFFF;  /‘LOAD  COUNTER  A  WITH  MAXIMUM*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 
SSEQ.REG  ENR12,SET; 

SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 
SSEQ.REG  ENR12,CLR; 

SSEQ  CONT 
$CR  CRW 
SCRA  PCTRAN 

SSEQ.REG  PCXSET/PSDATA;  /“SET  PC  XMIT  FOR  CLK  A\32  BIT  REAL  DATA*/ 

SSEQ  CONT 
SCR  CRW 
SCRA  HSIO 
SSEQ.REG  ALL,CLR; 


/•WRITE  CONTROL  REGISTER  */ 
SET;  /“SET  HSIO  RECEIVE  BUSY  FLAG*/ 


/•RESET  PC  XMIT  INTERFACE*/ 


/•READY  PC  XMIT  INTERFACE*/ 


/•  RESET  THE  HSIO  INTERFACE  */ 


/•SET  HSIO  TO  RECEICE  PC  32  BIT  REAL  DATA*/ 


SSEQ  CONT 
SCR  CRW 

SCRA  HSIO  .  _ 

SSEQ.REG  HSI0IRT,HI0PC;  /*AND  USE  COUNTER  A*/ 

SSEQ  CONT 
$CR  CRW 
SCRA  HSIO 

SSEQ.REG  BNR16,SET;  /•ENABLE  HSIO  IN  I/O  WRITE  MODE*/ 


SSEQ  CONT; 

SSEQ  CONT 
$CR  ,IBKPC; 


/•  ♦♦♦  MYSTERY  CODE  TO  LOAD  COUNTER  A-0  INT  VECTOR  ♦♦♦  */ 
/•IBM  PC  SELECTED  AS  SOURCE*/ 


SSEQ  CONT 
SCR  CRW 
SCRA  PCTRAN 

SSBQ.REa  Einu7,  SET;  /*SET  POO  BIT  (LOW)  TO  BEOIN  TBABSFER*/ 


$SBQ  BHAIR; 


/•EHABLE  USEQUENCBR  IRTEBBUFTS*/ 


/‘COUBTER  A  ZERO  IHTBRRUFT  FUHCTIOR: 
DISABLE  INTERRUPTS 
CLEAR  POO  BIT 
RESET  ALL  BUSY  FLA08 
RESET  COUNTER  INTERRUPT  VECTOR 


Data:  6/30/92 
Size:  3333 


110  - 
111  -  , 
112  -  */ 

113  - 

114  - 


RESET  ALL  USED  PORTS 
ENABLE  INTERRUPTS 


115  -  PROGRAM  ENDS 


Pile:  DNLOCPB.ASM 

Last  Modified;  Wad  Jun  10  12:09:14  1992 
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